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Preface 



The Portuguese Association for Artificial Intelligence (APPIA) has been regu- 
larly organising the Portuguese Conference on Artificial Intelligence (EPIA). 
This ninth conference follows previous ones held in Porto (1985), Lisboa (1986), 
Braga (1987), Lisboa (1989), Albufeira (1991), Porto (1993), Funchal (1995) 
and Coimbra (1997). Starting in 1989, the conferences have been held biennially 
(alternating with an APPIA Advanced School on Artificial Intelligence) and 
become truly international: English has been adopted as the official language 
and the proceedings are published in Springer’s LNAI series. 

The conference has reconfirmed its high international standard this year, 
largely due to its programme committee, composed of distinguished researchers 
in a variety of specialities in Artificial Intelligence, half of them from Portuguese 
universities. This has attracted a significant international interest, well expressed 
by the number of papers submitted (66), from 17 different countries, 29 of which 
are by Portuguese researchers. 

From the 66 papers submitted, about one third of them (23) were selected 
for oral presentation and have been published in this volume. The review process 
enabled the selection of high quality papers, each paper being reviewed by two or 
three reviewers, either from the programme committee or by their appointment. 
We would like to thank all of the reviewers for their excellent and hard work. 

We would also like to thank the invited speakers, Pascal Van Hentenryck, 
Nada Lavrac and Klaus Fischer, for their contribution to EPIA’99, which has 
significantly boosted interest in the conference. Their invited lectures are pre- 
sented in a separate section of this volume. We should also thank Pavel Brazdil 
and Terrance Swift for their tutorials. 

As in previous EPIA conferences, two workshops were organised in associa- 
tion with this one, one in the area of Data Mining (EKDB-99) and the other in 
Natural Language Processing (PROPOR-99). We would like to thank the orga- 
nisers for their efforts, in particular, Fernando Moura Pires, Gabriela Guimaraes 
and AKpio Jorge (EKDB-99), and Gabriel Pereira Lopes, Irene Pimenta Rodri- 
gues and Paulo Quaresma (PROPOR-99). Finally, we would like to thank all 
the people who helped us organise the conference, and in particular, Jorge Gruz 
for his help in the publication of this volume. 

The conference would not have been possible without the support of a num- 
ber of institutions, which we would like to publicly acknowledge: Fundagao para a 
Giencia e Tecnologia, Fundagao Galouste Gulbenkian, ESPRIT NoE Gompulog- 
Net, Logic Programming Associates, Alcatel Portugal, Eastecnica and Valnet 
Sado. We would also like to thank the Departamento de Informatica da Univer- 
sidade Nova de Lisboa and the Departamento de Matematica da Universidade 
de Evora for their support in the organisation of the conference. 
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Abstract, opl is a modeling language for mathematical programming 
and combinatorial optimization problems. It is the first modeling langu- 
age to combine high-level algebraic and set notations from modeling lan- 
guages with a rich constraint language and the ability to specify search 
procedures and strategies that are the essence of constraint program- 
ming. In addition, OPL models can be controlled and composed using 
OPLScript, a script language that simplihes the development of applica- 
tions that solve sequences of models, several instances of the same model, 
or a combination of both as in column-generation applications. Finally, 
OPL models can be embedded in larger application through C++ code 
generation. This paper presents an overview of these functionalities on a 
scheduling application. 



1 Introduction 

Combinatorial optimization problems are ubiquitous in many practical appli- 
cations, including scheduling, resource allocation, planning, and configuration 
problems. These problems are computationally difficult (i.e., they are NP-hard) 
and require considerable expertise in optimization, software engineering, and the 
application domain. 

The last two decades have witnessed substantial development in tools to sim- 
plify the design and implementation of combinatorial optimization problems. 
Their goal is to decrease development time substantially while preserving most 
of the efficiency of specialized programs. Most tools can be classified in two 
categories: mathematical modeling languages and constraint programming lan- 
guages. Mathematical modeling languages such as AMPL [S] and GAMS [I] 
provide very high-level algebraic and set notations to express concisely mathe- 
matical problems that can then be solved using state-of-the-art solvers. These 
modeling languages do not require specific programming skills and can be used 
by a wide audience. Constraint programming languages such as CHIP |1], Pro- 
log III and its successors [2|, OZ [S], and Ilog Solver [7j have orthogonal 
strenghts. Their constraint languages, and their underlying solvers, go beyond 
traditional linear and nonlinear constraints and support logical, high-order, and 
global constraints. They also make it possible to program search procedures to 
specify how to explore the search space. However, these languages are mostly 
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aimed at computer scientists and often have weaker abstractions for algebraic 
and set manipulation. 

The work described in this paper originated as an attempt to unify mode- 
ling and constraint programming languages and their underlying implementation 
technologies. It led to the development of the optimization programming langu- 
age OPL [TOj, its associated script language oPLScript (^, and its development 
environment opl studio. 

OPL is a modeling language sharing high-level algebraic and set notations 
with traditional modeling languages. It also contains some novel functionalities 
to exploit sparsity in large-scale applications, such as the ability to index arrays 
with arbitrary data structures, opl shares with constraint programming langu- 
ages their rich constraint languages, their support for scheduling and resource 
allocation problems, and the ability to specify search procedures and strategies. 
OPL also makes it easy to combine different solver technologies for the same 
application. 

OPLScript is a script language for composing and controlling opl models. 
Its motivation comes from the many applications that require solving several 
instances of the same problem (e.g., sensibility analysis), sequences of models, or 
a combination of both as in column-generation applications. OPLScript supports 
a variety of abstractions to simplify these applications, such as opl models as 
first-class objects, extensible data structures, and linear programming bases to 
name only a few. 

OPL studio is the development environment of opl and oPLScript. Beyond 
support for the traditional ’’edit, execute, and debug” cycle, it provides automa- 
tic visualizations of the results (e.g., Gantt charts for scheduling applications), 
visual tools for debugging and monitoring opl models (e.g., visualizations of the 
search space), and C++ code generation to integrate an opl model in a larger 
application. The code generation produces a class for each model objects and 
makes it possible to add/remove constraints dynamically and to overwrite the 
search procedure. 

The purpose of this paper is to illustrate how to solve combinatorial opti- 
mization problems in opl studio using a scheduling application. It is of course 
impossible to cover even a reasonable fraction of the features available in opl and 
OPLScript but the hope is to convey a flavor of these languages and an overview 
of the overall approach. See m for a companion paper describing the constraint 
programming features of opl. The rest of this paper is organized as follows. Sec- 
tion El describes the opl model for the scheduling applications. Section |5| how 
OPLScript can control the models, while Section 0] describes C++ code generation. 
All the models/scripts/programs can be run in ILOG OPL Studio 2.1. 

2 The Modeling Language OPL 

This section illustrates opl on a scheduling application. The application demon- 
trates various modeling concepts of opl as well as some of the opl support for 
scheduling applications. In particular, the application illustrates the concepts 
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of activities, unary resources, discrete resources, state resources, and transition 
times, giving a preliminary understanding of the rich support of opl in this im- 
portant area. To ease understanding, the application is presented in stepwise 
refinements starting with a simplified version of the problem and adding more 
sophisticated concepts incrementally. 

2.1 The Basic Model 

Consider an application where a number of jobs must be performed in a shop 
equipped with a number of machines. Each job corresponds to the processing 
of an item that needs to be sequentially processed on a number of machines 
for some known duration. Each item is initially available in area A of the shop. 
It must be brought to the specified machines with a trolley. After it has been 
processed on all machines, it must be stored in area S of the shop. Moving an 
item from area x to area y consists of (1) loading the item on the trolley at area 
x; (2) moving the trolley from area x to area y and (3) unloading the item from 
the trolley at area y. The goal is to find a schedule minimizing the makespan. 
In this version of the problem, we ignore the time to move the trolley and we 
assume that the trolley has unlimited capacity. Subsequent sections will remove 
these limitations. 

The specific instance considered here consists of 6 jobs, each of which requires 
processing on two specified machines. As a consequence, a job consists of 8 tasks 

1. load the item on the trolley at area A; 

2. unload the item from the trolley at the area of the first machine required by 
the job; 

3. process the item on the first machine; 

4. load the item on the trolley at the area of this machine; 

5. unload the item from the trolley at the area of the second machine required 
by the job; 

6. process the item on the second machine; 

7. load the item on the trolley at the area of this machine; 

8. unload the item from the trolley at Area S; 

Figures Q] and El depict an opl model for this problem, while Figure 0 describes 
the instance data. This separation between model and data is an important 
feature of modeling languages. 

The statement starts by defining the set of jobs, the set of tasks to be per- 
formed by the jobs, and the possible locations of the trolley. As can be seen in 
the instance data (see Figure E]), the tasks correspond to the description given 
previously. The trolley has five possible locations, one for each available machine, 
one for the arrival area, and one for the storage area. The statement then defines 
the machines and the data for the jobs, i.e., it specifies the two machines required 
for each job and the duration of the activities to be performed on these machi- 
nes. The machines are identified by their locations for simplicity. The statement 
also specifies the duration of a loading task, which concludes the description of 
the input data. 
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emim Jobs . . . ; 
enum Tasks . . . ; 
emim Location . . . ; 
{Location} Machines = . . . ; 
struct jobRecord { 
Location machine 1; 
int durationsl; 
Location machine2; 
int durations2; 

}; 

jobRecord job[Jobs] = 
int loadDuration = . . . ; 



Location location[Jobs .Tasks] ; 
initialize { 

foralKj in Jobs) { 

location[j .loadA] = areaA; 
location[j .unloadl] = job [j] .machine 1 ; 
location[j .processl] = job [j] .machine 1 ; 
location[j .loadl] = job[j] .machine 1; 
location[j ,unload2] = job [j] .machine2 ; 
location[j ,process2] = job[j] .machine2; 
location[j ,load2] = job[j] .machine2; 
location[j .unloads] = areaS; 

}; 

}; 



int duration [Jobs .Tasks] ; 
initialize { 

foralKj in Jobs) { 

durationEj .loadA] = loadDuration; 
durationEj .unloadl] = loadDuration; 
durationEj .processl] = job [j] . durationsl ; 
durationEj .loadl] = loadDuration; 
durationEj .unload2] = loadDuration; 
durationEj .process2] = job[j] .durations2; 
durationEj .load2] = loadDuration; 
durationEj .unloads] = loadDuration; 

} 

}; 



Fig. 1. The Trolley Problem: Part I. 
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scheduleHorizon = 2000; 

UnaryResource machine [Machines] ; 

StateResource trolley (Location) ; 

Activity act [i in Jobs.j in Tasks] (duration [i , j] ) ; 

Activity makespan(O) ; 

minimize 

makespan. end 
subject to { 

foralKj in Jobs & ordered tl, t2 in Tasks) 
act[j,tl] precedes act[j,t2]; 
foralKj in Jobs) { 

act [j ,processl] requires machine [job [j] .machine 1] ; 
act [j ,process2] requires machine [job [j] ,machine2] ; 

}; 

foralKj in Jobs, t in Tasks : t <> processl & t <> process2) 
act[j,t] requiresState(location[j ,t] ) trolley; 
foralKj in Jobs) 

act [j jUnloadS] precedes makespan; 

}; 



search { 

setTimes (act) ; 

}; 



Fig. 2. The Trolley Problem: Part II. 



Jobs = {jl, j2, j3, j4, j5, j6}; 

Tasks = {loadA.unloadl , processl , loadl ,unload2 ,process2 , load2, unloads} ; 
Location = {ml ,m2,m3, areaA, areaS} ; 

Machines = { ml, m2, m3 }; 
job = [ 

<ml,80,m2,60>, <m2, 120,m3,80>, <m2,80,ml,60>, 

<ml,160,m3,100>, <m3, 180 ,m2 ,80> , <m2 , 140 ,m3, 60> ]; 
loadDuration = 20; 



Fig. 3. The Trolley Problem: the Instance Data. 
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The remaining instructions in Figure [T] specify derived data that are useful 
in stating the constraints. The instruction 

Locations location [Jobs , Tasks] ; 
initialize { 

foralKj in Jobs) { 

location [j , loadA] = areaA; 
location [j .unloadl] = job [j] .machine 1 ; 
location [j .processl] = job [j] .machine 1 ; 
location [j ,loadl] = job [j] .machine 1 ; 
location [j ,unload2] = job[j] .machine2; 
location [j ,process2] = job [j] .machine2 ; 
location [j , load2] = job[j] .machine2; 
location [j .unloads] = areaS; 

}; 

}; 



specifies the locations where each task of the application must take place, while 
the next two instructions specify the durations of all tasks. The subsequent 
instructions, shown in Figure [21 are particularly interesting. The instruction 

ScheduleHorizon = 2000; 

specifies that the schedule horizon is 2000, i.e., that all tasks must be completed 
by that date. The instruction 

UnaryResource machine [Machines] ; 

declares the machines of this application. Machines are unary resources, which 
means that no two tasks can be scheduled at the same time on them. The 
implementation of opl uses efficient scheduling algorithms for reasoning about 
these resources, including the edge-finder algorithm m- Note also that the array 
machine is indexed by a set of values. In fact, arrays in opl can be indexed by 
arbitrary data structures (e.g., a set of records), which is important to exploit 
sparsity in large scale applications and to simplify modeling. The instruction 

StateResource trolley (Location) ; 

defines the trolley as a state resource whose states are the five possible locations 
of the trolley. A state resource is a resource that can only be in one state at 
any given time: Hence any two tasks requiring a different state cannot overlap 
in time. The instructions 

Activity act [i in Jobs.j in Tasks] (duration [i , j] ) ; 

Activity makespan(O) ; 

define the decision variables for this problem. They associate an activity with 
each task of the application and an activity to model the makespan. An activity 
in OPL consists of three variables, a starting date, an ending date, and a duration, 
and the constraints linking them. Note also how the subscripts i and j are used in 
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the declaration to associate the proper duration with every task. These generic 
declarations are often useful to simplify problem description. The rest of the 
statement specifies the objective function and the constraints. The objective 
function consists of minimizing the end date of the makespan activity. Note that 
the starting date, the ending date, and the duration of an activity are all accessed 
as fields of records (or instance variables of objects). The instruction 

foralKj in Jobs & ordered tl, t2 in Tasks) 
act[j,tl] precedes act[j,t2]; 

specifies the precedence constraints inside a job. It also illustrates the rich ag- 
gregate operators in opl. The instruction 

foralKj in Jobs) { 

act [j .processl] requires machine [job[j] .machinel] ; 
act [j ,process2] requires machine [job [j] .machine2] ; 

}; 



specifies the unary resource constraints, i.e., it specifies which task uses which 
machine. The opl implementation collects all these constraints that can then be 
used inside the edge-finder algorithm. The instruction 

foralKj in Jobs, t in Tasks : t <> processl & t <> process2) 
act[j,t] requiresState (location [j ,t] ) trolley; 

specifies the state resource constraints for the trolley, i.e., it specifies which tasks 
require the trolley to be at a specified location. The instruction 

foralKj in Jobs) 

act [j .unloads] precedes makespan; 

makes sure that the makespan activity starts only when all the other tasks are 
completed. Finally, note the instruction 

search { 

setTimes (act) ; 

}; 



that specifies the search procedure. It illustrates that opl support user-defined 
search procedures. The search procedure in this model is rather simple and uses 
a procedure setTimes (act) that assigns a starting date to every task in the 
array act by exploiting dominance relationships. The solution produced by opl 
for this application is of the form 

act [j 1 jloadA] = [0 — 20 — > 20] 
act [j 1 jUnloadl] = [40 — 20 — > 60] 
act [j 1 .processl] = [60 — 80 — > 140] 
act[jl,loadl] = [140 — 20 — > 160] 
act [j 1 ,unload2] = [160 — 20 — > 180] 
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act [j 1 ,process2] = [380 — 60 — > 440] 
act[jl,load2] = [440 — 20 — > 460] 
act [j 1 .unloads] = [460 — 20 — > 480] 

act [j 6, unloads] = [540 — 20 — > 560] 
makespan = [560 — 0 — > 560] 

It displays the starting date, the duration, and the completion time of each 
activity in the model. 

2.2 Transition Times 

Assume now that the time to move the trolley from an area to another must 
be taken account. This new requirement imposes transition times between suc- 
cessive activities. In opl, transition times can be specified between any two 
activities requiring the same unary or state resource. Given two activities a and 
b, the transition time between a and b is the amount of time that must elapse 
between the end of a and the beginning of b when a precedes b. Transition times 
are modelled in two steps in opl. First, a transition type is associated with each 
activity. Second, a transition matrix is associated with the appropriate state or 
unary resource. To determine the transition time between two successive activi- 
ties a and 6 on a resource r, the transition matrix is indexed by the transition 
types of a and b. 

In the trolley application, since the transition times depend on the trolley 
location, the key idea is that each activity may be associated with a transition 
type that represents the location where the activity is taking place. For instance, 
task unloadl of job j 1 is associated with state ml if the first machine of j 1 is 
machine 1. The state resource can be associated with a transition matrix that, 
given two locations, return the time to move from one to the other. The model 
shown in the previous section can thus be enhanced easily by adding a declaration 

int transition[Location, Location] = . . . ; 

and by modifying the state resource and activity declarations to become 

StateResource trolley (Location, transition) ; 

UnaryResource machine [Machines] ; 

Activity act [i in Jobs.j in Tasks] (duration [i , j] ) 
transitionType location[i, j] ; 

Using a transition matrix of the form 

[ 

[ 0 , 50 , 60 , 50 , 90 ] , 

[ 50 , 0 , 60 , 90 , 50 ] , 

[ 60 , 60 , 0 , 80 , 80 ] , 

[ 50 , 90 , 80 , 0,120 ] , 

[ 90 , 50 , 80 , 120 , 0 ] 

] ; 
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would lead to an optimal solution of the form 

act [j 1 ,loadA] = [0 — 20 — > 20] 
act [j 1 ,unloadl] = [70 — 20 — > 90] 
act [j 1 ,processl] = [90 — 80 — > 170] 
act[jl,loadl] = [370 — 20 — > 390] 
act [j 1 ,unload2] = [530 — 20 — > 550] 
act [j 1 ,process2] = [550 — 60 — > 610] 
act[jl,load2] = [850 — 20 — > 870] 
act [j 1 .unloads] = [920 — 20 — > 940] 

act [j 6, unloads] = [920 — 20 — > 940] 

makespan = [940 — 0 — > 940] 

2.3 Capacity Constraints 

Consider now adding the requirement that the trolley has a limited capacity, 
i.e., it can only carry so many items. To add this requirement in opl, it is 
necessary to model the trolley by two resources: a state resource as before and 
a discrete resource that represents its capacity. Several activities can require 
the same discrete resource at a given time provided that their total demand 
does not exceed the capacity. In addition, it is necessary to model the tasks 
of moving from a location to another. As a consequence, each job is enhanced 
by three activities that represents the move from area A to the first machine, 
from the first machine to the second machine, and from the second machine to 
area S. Each of these trolley activities uses one capacity unit of the trolley. The 
declarations 

int trolleyMaxCapacity = 3; 

DiscreteResource trolleyCapacity (trolleyMaxCapacity) ; 
enum TrolleyTasks {onTrolleyAl , onTrolleyl2 , onTrolley2S} ; 

Activity tact [Jobs .TrolleyTasks] ; 

serve that purpose. It is now important to state that these activities require the 
trolley capacity and when these tasks must be scheduled. The constraint 

foralKj in Jobs, t in TrolleyTasks) 

tactfj.t] requires trolleyCapacity; 

specify the resource consumption, while the constraints 

foralKj in Jobs) { 

tact [j .onTrolleyAl] . start = act [j . loadA] . start ; 
tact [j . onTrolleyAl] . end = act [j .unloadl] .end; 
tact [j . onTrolleyl2] . start = act [j . loadl] . start ; 
tact [j .onTrolleyl2] .end = act [j .unload2] . end; 
tact [j .onTrolley2S] . start = act [j . load2] . start ; 
tact [j .onTrolley2S] .end = act [j .unloads] . end; 

}; 
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specify the temporal relationships, e.g., that the activity of moving from area 
A to the first machine in a job should start when the item is being loaded on 
the trolley and is completed when the item is unloaded. The trolley application 
is now completed and the final model is depicted in Figures [4| and |5] This last 
model in fact is rather difficult to solve optimally despite its reasonable size. 

3 The Script Language OPLScript 

oPLScript is a language for composing and controlling opl models. It is par- 
ticularly appropriate for applications that require solving several instances of 
the same model, a sequence of models, or a combination of both as in column- 
generation applications. See jH] for an overview of these functionalities. OPLScript 
can also be used for controlling opl models in order to find good solutions quickly 
or to improve efficiency by exploiting more advanced techniques (e.g., shuffling 
in job-shop scheduling). This section illustrates how OPLScript can be used to 
find a good solution quickly on the final trolley application. 

The motivation here is that it is sometimes appropriate to limit the time 
devoted to the search of an optimal solution by restricting the number of failures, 
the number of choice points, or the execution time. Figure [7| depicts a script for 
the trolley problem that limits the number of failures when searching for a better 
solution. The basic idea of the script is to allow for an initial credit of failures 
(say, i) and to search for a solution within these limits. When a solution is found 
with, say, / failures, the search is continued with a limit of i + f failures, i.e., the 
number of failures needed to reach the last solution is increased by the initial 
credit. Consider now the script in detail. The instruction 

Model mC'trolley .mod" , "trolley.dat") ; 

defines a OPLScript model in terms of its model and data files. Models are first- 
class objects in OPLScript: They can be passed as parameters of procedures and 
stored in data structures and they also support a variety of methods. For in- 
stance, the method nextSolution on a model can be used to obtain the succes- 
sive solutions of a model or, in optimization problems, to produce a sequence of 
solutions, each of which improves the best value of the objective function found 
so far. The instruction 

m. setFailLimit (fails) ; 

specifies that the next call to nextSolution can perform at most fails failures, 
i.e., after fails failures, the execution aborts and nextSolutionO returns 0. 
The instructions 

while m. nextSolutionO do { 
solved := 1; 

cout << "solution with makespan: " « m.objectiveValueO << endl; 
m. setFailLimit (m.getNumberOf Fails O + fails); 



} 
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enum Jobs . . . ; 
enum Tasks . . . ; 
enum Location . . . ; 

{Location} Machines = . . . ; 
struct jobRecord { 

Location machine 1; 
int durationsl; 

Location machine2; 
int durations2; 

}; 

jobRecord job[Jobs] = 
int loadDuration = . . . ; 

int transition[Location, Location] = . . . ; 
int trolleyMaxCapacity = . . . ; 



Location location [Jobs .Tasks] ; 
initialize { 

foralKj in Jobs) { 

locationEj .loadA] = areaA; 
location[j .unloadl] = job [j] .machine 1 ; 
locationEj .processl] = job [j] .machine 1 ; 
location[j .loadl] = job[j] .machine 1; 
location[j ,unload2] = job [j] .machine2; 
location[j ,process2] = job[j] .machine2; 
location[j ,load2] = job[j] .machine2; 
location[j .unloads] = areaS; 

}; 

}; 



int duration [Jobs .Tasks] ; 
initialize { 

foralKj in Jobs) { 

durationEj .loadA] = loadDuration; 
durationEj .unloadl] = loadDuration; 
durationEj .processl] = jobEj] .durationsl; 
durationEj .loadl] = loadDuration; 
durationEj .unload2] = loadDuration; 
durationEj .process2] = jobEj] .durations2; 
durationEj .load2] = loadDuration; 
durationEj .unloads] = loadDuration; 

} 

}; 



Fig. 4. The Final Trolley Model: Part I. 
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scheduleHorizon = 2000; 

UnaryResource machine [Machines] ; 

StateResource trolley (Location) ; 

DiscreteResource trolleyCapacity (trolleyMaxCapacity) ; 

Activity act [i in Jobs.j in Tasks] (duration [i , j] ) 
transitionType location [i ,j] ; 

Activity tact [Jobs, TrolleyTasks] ; 

Activity makespan(O) ; 

minimize 

makespan. end 
subject to { 

foralKj in Jobs & ordered tl, t2 in Tasks) 
act[j,tl] precedes act[j,t2]; 
foralKj in Jobs) { 

act [j ,processl] requires machine [job [j] .machine 1] ; 
act [j ,process2] requires machine [job [j] ,machine2] ; 

}; 

foralKj in Jobs, t in Tasks : t <> processl & t <> process2) 
act[j,t] requiresState(location[j ,t] ) trolley; 
foralKj in Jobs, t in TrolleyTasks) 
tact[j,t] requires trolleyCapacity; 
foralKj in Jobs) { 

tact [j , onTrolleyAl] . start = act [ j , loadA] . start ; 
tact [j , onTrolleyAl] . end = act [j ,unloadl] . end; 
tact [ j , onTrolley 12] . start = act [ j , loadl] . start ; 
tact [j , onTrolley 12] .end = act [ j ,unload2] .end; 
tact [j , onTrolley2S] . start = act [ j , load2] . start ; 
tact [j , onTrolley2S] .end = act [ j , unloads] .end; 

}; 

foralKj in Jobs) 

act [j ,unloadS] precedes makespan; 



search { 

setTimes (act) ; 

}; 



Fig. 5. The Final Trolley Model: Part II. 
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Model mC'trolley .mod" , "trolley.dat") ; 
int fails := 1000; 
m. setFailLimit (fails) ; 
int solved := 0; 
while m.nextSolutionO do { 
solved := 1; 

cout « "solution with makespan: " << m. objectiveValue () « endl; 
m. setFailLimit (m. getNumberOf Fails () + fails); 

} 

m. setFailLimit (m. getNumberOf Fails 0 + fails); 
if solved then { 
m. restore 0 ; 

cout « "final solution with makespan: " « m. objectiveValueO << endl; 



cout « "Time: " << m.getTimeO << endl; 

cout « "Fails: " « m. getNumberOfFails () « endl; 

cout « "Choices: " « m.getNumberOfChoicePointsO « endl; 

cout « "Variables: " << m.getNumberOfVariablesO « endl; 



Fig. 6. A Script for the Trolley Problem (trolley . osc) . 



make up the main loop of the script and produce a sequence of solutions, each 
of which having a smaller makespan. Note the instruction 

m. setFailLimit (m. getNumberOf Fails 0 + fails); 

that retrieves the number of failures needed since the creation of model m and sets 
a new limit by adding fails to this number. The next call to nextSolution takes 
into account this new limit when searching for a better solution. Note also the 
instruction m.restoreO to restore the last solution found by nextSolutionO . 
This script displays a result of the form 

solution with makespan: 2310 
solution with makespan: 2170 
solution with makespan: 2080 

solution with makespan: 1690 
solution with makespan: 1650 
solution with makespan: 1620 
final solution with makespEui: 1620 
Time: 7.0200 
Fails: 3578 
Choices: 3615 
Variables: 204 






14 



P. Van Hentenryck et al. 



4 Code Generation in OPL Studio 

Once a reasonable model has been successfully designed in opl, it can be inte- 
grated in a larger application through C++ code generation. 



int mainCint argc.char* argv[]) 

{ 

IloSolver_trolleyComplete solver ; 
if (solver .nextSolutionO ) { 

IloArray_act act = solver . get_act 0 ; 

IloEnum_Jobs Jobs = solver ,get_Jobs() ; 

IloEnum_Tasks Tasks = solver . get_Tasks 0 ; 
IloEnumIterator_Jobs ite Jobs (Jobs) ; 
while (iteJobs . ok() ) { 

IloEniimlterator .Tasks iteTasks (Tasks) ; 
while (iteTasks . ok() ) { 

cout << "act[" << *iteJobs « << *iteTask << 

cout << act [*ite Jobs] [*iteTask] « endl; 
++iteTasks ; 

} 

++iteJobs ; 

} 

cout « endl; 

} 

solver . endO ; 
return 0; 

} 



Fig. 7. C++ Code for the Trolley Problem. 



The basic idea behind code generation consists of producing a C++ class associa- 
ted with each object in the model and a top-level class for the model. In other 
words, the generated code is specialized to the model and is strongly typed. The- 
ses classes can then be used to access and modify the data and, of course, to solve 
the model and collect the results. Figure [6] depicts C++ code to obtain the first 
solution to the trolley application and to display the main activities. Instruction 
IloSolver.trolley solver; defines an instance solver that encapsulates the 
functionality of the trolley model. The class definition is available in the . h that 
is not shown here. The instruction IloArray.act act = solver . get.act () ; is 
used to obtain the array of activities, while the instructions 

IloEnum.Jobs Jobs = solver .get_Jobs() ; 

IloEnum.Tasks Tasks = solver . get.Tasks () ; 

IloEnumIterator.Jobs ite Jobs (Jobs) ; 

obtain the enumerated types and define an iterator for the jobs. The remaining 
instructions iterate over the enumerated sets and display the activities. 
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Abstract. Inductive logic programming (ILP) is a research area that 
has its roots in inductive machine learning and logic programming. Com- 
putational logic has significantly influenced machine learning through the 
field of inductive logic programming (ILP) which is concerned with the 
induction of logic programs from examples and background knowledge. 
Machine learning, and ILP in particular, has the potential to influence 
computational logic by providing an application area full of industrially 
significant problems, thus providing a challenge for other techniques in 
computational logic. In ILP, the recent shift of attention from program 
synthesis to knowledge discovery resulted in advanced techniques that 
are practically applicable for discovering knowledge in relational databa- 
ses. This paper gives a brief introduction to ILP, presents state-of-the-art 
ILP techniques for relational knowledge discovery as well as some chal- 
legnes and directions for further developments in this area. 



1 Introduction 

Inductive logic programming (ILP) 1471511401 is a research area that has its roots 
in inductive machine learning and logic programming. ILP research aims at a for- 
mal framework as well as practical algorithms for inductive learning of relational 
descriptions that typically have the form of logic programs. From logic program- 
ming, ILP has inherited its sound theoretical basis, and from machine learning, 
an experimental approach and orientation towards practical applications. ILP 
research has been strongly influenced also by Computational learning theory, 
and recently, also by Knowledge Discovery in Databases (KDD) which led 
to the development of new techniques for relational data mining. 

In general, an ILP learner is given an initial theory B (background know- 
ledge) and some evidence E (examples), and its aim is to induce a theory H 
(hypothesis) that together with B explains some properties of E. In most cases 
the hypothesis H has to satisfy certain restrictions, which we shall refer to as 
a bias. Bias includes prior expectations and assumptions, and can therefore be 
considered as the logically unjustified part of the background knowledge. Bias is 
needed to reduce the number of candidate hypotheses. It consists of the language 
bias L, determining the hypothesis space, and the search bias which restricts the 
search of the space of possible hypotheses. 

The background knowledge used to construct hypotheses is a distinctive fea- 
ture of ILP. It is well known that relevant background knowledge may substan- 
tially improve the results of learning in terms of accuracy, efficiency and the 
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explanatory potential of the induced knowledge. On the other hand, irrelevant 
background knowledge will have just the opposite effect. Consequently, much 
of the art of inductive logic programming lies in the appropriate selection and 
formulation of background knowledge to be used by an ILP learner. 

This paper first gives a brief introduction to ILP and presents a selection 
of recently developed ILP techniques for relational knowledge discovery. The 
overview is restricted to techniques satisfying the strong criterion formulated for 
machine learning by Michie m that requires explicit symbolic form of induced 
descriptions. The paper proceeds with ILP challenges and concludes with some 
suggestions for futher research in ILP. 

2 State-of-the-Art in ILP 

This section briefly introduces two basic theoretical settings, gives pointers to 
successful ILP applications and presents recent technological developments in 
the area, categorized into the two main theoretical settings. 

2.1 ILP Problem Specification 

An inductive logic programming task can be formally defined as follows: 

Given: 

• a set of examples E 

• a background theory B 

• a language bias L that defines the clauses allowed in hypotheses 

• a notion of explanation 

Find: a hypothesis H <Z L which explains the examples E with respect to 
the theory B. 

This definition needs to be instantiated for different types of ILP tasks m- 
The instantiation will concern the representation of training examples, the choice 
of a hypothesis language and an appropriate notion of explanation. By expla- 
nation we here refer to an acceptance criterion of hypotheses: the hypothesis 
explains the data if it satisfies a certain user-defined criterion w.r.t. the data. 
We will discuss some formal acceptance criteria used in different ILP settings, 
but we also need to bear in mind that ILP aims at the induction of hypotheses 
that are expressed in an explicit symbolic form, that can be easily interpreted by 
the user/expert and may contribute to the better understanding of the problem 
addressed, ideally forming a piece of new knowledge discovered from the data. 

2.2 ILP Settings 

The state-of-the-art ILP settings are overviewed below. For the underlying theory 
see [SHU]. For a practical introduction to ILP see |40]. Transparencies of the 
Summer School in ILP and KDD, held in Prague in September 1997, giving an 
introduction to ILP can be found at: 
www-ai . i j s . si/SasoDzeroski/ILP2/ ilpkdd/. 
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Predictive ILP Predictive ILP is the most common ILP setting, ofter referred 
to as normal ILP, explanatory induction, discriminatory induction, or strong 
ILP. Predictive ILP is aimed at learning of classification and prediction rules. 
This ILP setting typically restricts E to ground facts, and H and B to sets of 
definite clauses. The strict notion of explanation in this setting usually denotes 
coverage and requires global completeness and consistency. 

Global completeness and consistency implicitly assume the notion of inten- 
sional coverage defined as follows. Given background theory B, hypothesis H and 
example set E, an example e G E is (intensionally) covered by H if B U H \= e. 
Hypothesis H is (globally) complete if Ve S E’’*' : B U H \= e. Hypothesis H is 
(globally) consistent if Ve £ E~ : B U H ^ e. 

Given the restriction to definite theories T = H U B, for which there exists 
a unique least Herbrand model M{T), and to ground atoms as examples, this is 
equivalent to requiring that all examples in E+ are true in M{BU H) |51J . 

By relaxing the notion of explanation to allow incomplete and inconsistent 
theories that satisfy some other acceptance criteria (predictive accuracy, sig- 
nificance, compression), the predictive ILP setting can be extended to include 
learning of classification and prediction rules from imperfect data, as well as 
learning of logical decision trees [2]. In a broader sense, predictive ILP incorpo- 
rates also first- order regression Ea and constraint inductive logic programming 
|55j | for which again different acceptance criteria apply. 



Descriptive ILP Descriptive ILP is sometimes referred to as confirmatory 
induction, non-monotonic ILP, description learning, or weak ILP. Descriptive 
ILP is usually aimed at learning of clausal theories m- This ILP setting typically 
restricts E to a set of definite clauses, E to a set of (general) clauses, and E to 
positive examples. The strict notion of explanation used in this setting requires 
that all clauses c in E are true in some preferred model of T = BUE, where the 
preferred model of T may be, for instance, the least Herbrand model M(T). (One 
may also require the completeness and minimality of E, where completeness 
means that a maximally general hypothesis E is found, and minimality means 
that the hypothesis does not contain redundant clauses.) 

By relaxing the strict notion of explanation used in clausal discovery |13| to 
allow for theories that satisfy some other acceptance criteria (similarity, associa- 
tivity, interestingness), descriptive ILP can be extended to incorporate learning 
of association rules [8], first-order clustering II2I2II36I . database restructuring 
I24l56l subgroup discovery [^, learning qualitative models jST] and equation di- 
scovery |I9| . 

An illustrative example Gonsider a problem of learning family relations 
where the predictive knowledge discovery task is to define the target relation 
daughter (X,Y), which states that person X is a daughter of person Y, in terms 
of relations defined in background knowledge B. Let the training set E consist of 
positive and negative examples for the target predicate daughter/2. A positive 
example e £ E'*' provides information known to be true and should be entailed 
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by the induced hypothesis. A negative example e € E provides information 
that is known not to be true and should not be entailed. 

A+ = {daughter (mary,ann), daughter (eve, tom)} 

E~ = {daughter (tom, ann), daughter (eve, ann)} 

B = {mother (ann,mary), mother (ann, tom), father(tom,eve), 

father(tom,ian.), female(cinn), female (mary) , female(eve), 
parent (X,Y) •<— mother (X,Y), parent (X,Y) ^ father (X,Y), 
male (pat), male (tom)} 

If the hypothesis language L contains all definite clauses using the predicate 
and functor symbols appearing in the examples and background knowledge, a 
predictive ILP system can induce the following clause from A+, E~ and B: 

daughter (X,Y) ^ female(X), parent(Y,X). 

Alternatively, a learner could have induced a set of clauses: 

daughter (X,Y) ^ female(X), mother(Y,X). 

daughter (X,Y) ^ female(X), father(Y,X). 

In descriptive knowledge discovery, given E^ and B only, an induced theory 
could contain the following clauses: 

•f- daughter (X,Y) , mother(X,Y). 

female(X) •<— daughter (X,Y) . 

mother(X,Y); father(X,Y) parent (X,Y). 

One can see that in the predictive knowledge discovery setting classification 
rules are generated, whereas in the descriptive setting database regularities are 
derived. 



Other ILP settings There has been a suggestion m of how to integrate the 
two main settings of predictive and descriptive ILP. In this integrated frame- 
work the learned theory is a combination of (predictive) rules and (descriptive) 
integrity constraints that restrict the consequences of these rules. 

Other ILP settings have also been investigated, the most important being 
relational instance-based learning m- Excellent predictive results have been 
achieved by the relational instance-based learner RIBL m in numerous clas- 
sification and prediction tasks. Recently, first-order reinforcement learning EO] 
and first-order Bayesian classifier m have also been studied. Since these ILP 
settings do not involve hypothesis formation in explicit symbolic form, the deve- 
loped techniques do not qualify as techniques for relational knowledge discovery. 
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2.3 ILP Applications 

Most of the current real-world ILP applications involve predictive knowledge di- 
scovery, in particular the induction of classification and prediction rules from a 
given database of examples and the available background knowledge. Successful 
ILP applications include drug design [3^, protein secondary structure predic- 
tion m, mutagenicity prediction m, carcinogenesis prediction m, medical 
diagnosis [44j . discovery of qualitative models in medicine ISU, finite-element 
mesh design telecommunications natural language processing , re- 
covering software specifications [7j, and many others. 

There are many overview papers on ILP applications, including an early 
overview Q . A tutorial on ILP applications is available at 
www-ai . i j s . si/SasoDzeroski/ILP2/ ilpkdd/. 

A report providing short descriptions of applications developed as part of the 

ESPRIT IV project ILP2 is available at 

www-ai . i j s . si/SasoDzeroski/ILP2/ ilpkdd/appsdel/. 

ILP applications bibliography is available at: 

www-ai . i j s . si/SasoDzeroski/ILP2/ ilpkdd/ilpab . ps. 

A sample ILP application An application on river water quality evaluation 
is outlined, illustrating the predictive and descriptive ILP settings. 

Given a list of biological indicators present at a river water sampling site and 
their abundance levels (300 samples), the predictive knowledge discovery task is 
to classify the sample into one of the five classes Bla, Bib, B2, B3 and B4 [l8| . 
Knowledge about biological indicators (families of macro-benthis invertebrates) 
consisted of eighty predicates of the form famiilyCX.A), each denoting that 
family is present in sample X with aboundance level A (e.g., tipulidae(X,A), 
asellidae(X,A), etc.) and the background knowledge about the relations bet- 
ween abundance levels was applied. 

The ILP system Golem 15^ induced three rules for class Bla, fourteen for 
Bib, sixteen for B2, two for B3 and none for B4. According to river ecologist 
expert evaluation, many of the rules were interesting and agreed with the expert 
knowledge. Here are two examples of such rules. 

bla(X) C— leuctridae(X,A). 

blb(X) ^ ancylidae(X,A) , gammaridae(X,B) , hydropsychidae(X,C), 
rhyacophilidae (X,D) , greater_than(B, A) , greater_than(B,D) . 

The first rule, covering forty-three positive and four negative examples, states 
that a sample belongs to the best water quality class Bla if Leuctridae is present. 
Indeed, according to the expert, Leuctridae is an indicator of good water quality. 
The second rule states that Gammaridae in abundance is a good indicator of 
class Bib, along with the other families present. 

The same data was used for descriptive knowledge discovery by an ILP sy- 
stem Glaudien m- Glaudien checks all possible clauses, within a given language 
bias, for consistency w.r.t. the given examples. For instance, the first clause 
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below specifies more than one quality class for which the family Perlodidae is 
indicative. If, in addition, Rhyacophilidae are also present, then this indicates 
best water quality class Rio, as shown in the second rule. 

bla(X); bib (X) perlodidae (X, A). 

bla(X) C— perlodidae (X, A) , rhyacophilidae (X, A) . 

The river quality experts particularly liked the clauses induced by Claudien 
due to the coverage of more than one class by a single clause; this is important 
since the classification results were achieved by a discretization of the originally 
continuous clasification problem. 



2.4 ILP Techniques 

This section reviews the state-of-the-art ILP techniques most of which have al- 
ready shown their potential for use in real-life applications. The overview is 
limited to recent ILP developments, aimed at data mining from real-life databa- 
ses [41j . These developments have a marketing potential in the prosperous new 
areas of Data Mining and Knowledge Discovery in Databases. Further informa- 
tion on some of the systems can be found at 

www-ai . i j s . si/SasoDzeroski/lLP2/ ilpkdd/, and more details on some of the 
systems recently developed as part of the ESPRIT IV program on ILP2 at: 
www-ai . ijs . si/SasoDzeroski/ILP2/ ilpkdd/sysdel/. 

It is worthwhile noticing that none of the reviewed techniques belongs to 
programming assistants which have a much smaller marketing potential and a 
limited usefulness for solving real-life problems in comparison with ILP data 
mining tools and techniques. 



Predictive ILP 

Learning of classification rules. This is the standard ILP setting that has 
been used in numerous successful predictive knowledge discovery applications. 
The well-known systems for classification rule induction include Foil [5^, Golem 
m and Progol |1B]. Foil is efficient and best understood due to its similarity to 
Clark and Niblett’s CN2. On the other hand. Golem and Progol are champions 
concerning successful ILP applications, despite the fact that they are substanti- 
ally less efficient. Foil is a top-down learner. Golem is a bottom-up learner, and 
Progol uses a combined search strategy. All are mainly concerned with single pre- 
dicate learning from positive and negative examples and background knowledge; 
in addition, Progol can also be used to learn from positive examples only. They 
use different acceptance criteria: compression, coverage/accuracy and minimal 

^ A successor of Foil, the system Ffoil, can successfully be used for inducing relational 
definitions of functions. 
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description length, respectively. The system LINUS |39]?], developed from a lear- 
ning component of QuMAS |46| . introduced the propositionalization paradigm 
by transforming an ILP problem into a propositional learning task. 

Induction of logical decision trees. The system Tilde |2] belongs to Top- 
down induction of decision tree algorithms. It can be viewed as a first-order up- 
grade of Quinlan’s C4.5, employing logical queries in tree nodes which involves 
appropriate handling of variables. The main advantage of Tilde is its efficiency 
and capability of dealing with large numbers of training examples, which are 
the well-known properties of Tilde’s propositional ancestors. Hence Tilde cur- 
rently represents one of the most appropriate systems for predictive knowledge 
discovery. Besides the language bias. Tilde allows for lookahead and prepruning 
(according to the minimal number of examples covered) defined by parameter 
setting. 

First-order regression. The relational regression task can be defined as fol- 
lows: Given training examples as positive ground facts for the target predicate 
r{Y,Xi,...,Xn), where the variable Y has real values, and background know- 
ledge predicate definitions, find a definition for r{Y,Xi, ..., A„), such that each 
clause has a literal binding Y (assuming that Ai,...,A„ are bound). Typical 
background knowledge predicates include less-or-equal tests, addition, subtrac- 
tion and multiplication. An approach to relational regression is implemented in 
the system FORS (First Order Regression System) jSSj which performs top-down 
search of a refinement graph. In each clause, FORS can predict a value for the 
target variable Y as the output value of a background knowledge literal, as a 
constant, or as a linear combination of variables appearing in the clause (using 
linear regression). 

Inductive Constraint Logic Programming. It is well known that Con- 
straint Logic Programming (CLP) can successfully deal with numerical con- 
straints. The idea of Inductive Constraint Logic Programming (ICLP) is to 
benefit from the number-handling capabilities of CLP, and to use the constraint 
solver of CLP to do part of the search involved in inductive learning. To this 
end a maximally discriminant generalization problem in ILP is transformed to 
an equivalent constraint satisfaction problem (CSP). The solutions of the origi- 
nal ILP problem can be constructed from the solutions of CSP, which can be 
obtained by running a constraint solver on CSP. 



Descriptive ILP 

Learning of clausal theories and association rules. In discovering full 
clausal theories, as done in the system Claudien m, each example is a Her- 
brand model, and the system searches for the most general clauses that are true 
in all the models. Clauses are discovered independently from each other, which is 
a substantial advantage for data mining, as compared to the learning of classifi- 
cation rules (particularly learning of mutually dependent predicates in multiple 
predicate learning). In Claudien, search of clauses is limited by the language 
bias. Its acceptance criterion can be modified by setting two parameters: the re- 
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quested minimal accuracy and minimal number of examples covered. In another 
clausal discovery system, Primus [2^, the best-first search for clauses is guided 
by heuristics measuring the “confirmation” of clauses. The Claudien system was 
further extended to Warmr [8] that enables learning of association rules from 
multiple relations. 

First-order clustering. Top-down induction of decision trees can be viewed 
as a clustering method since nodes in the tree correspond to sets of examples 
with similar properties, thus forming concept hierarchies. This view was adopted 
in CO. 5 [I2j, an upgrade of the Tilde logical decision tree learner. A relational 
distance-based clustering is presented also in [3(ij . An early approach combining 
learning and conceptual clustering techniques was implemented in the system 
Cola [21]. Given a small (sparse) set of classified training instances and a set of 
unclassified instances. Cola uses Bisson’s conceptual clustering algorithm KBG 
on the entire set of instances, climbs the hierarchy tree and uses the classified 
instances to identify (single or disjunctive) class descriptions. 

Database restructuring. The system Fender m searches for common 
parts of rules describing a concept, thus forming subconcept definitions to be 
used in the refurmulation of original rules. The result is a knowledge base with 
new intermediate concepts and deeper inferential structure than the initial “flat” 
rulebase. The system Index m is concerned with the problem of determining 
which attribute dependencies (functional or multivalued) hold in the given re- 
lational database. The induced attribute dependencies can be used to obtain a 
more structured database. Both approaches can be viewed as doing predicate 
invention, where (user selected) invented predicates are used for theory restruc- 
turing. 

Subgroup discovery. The subgroup discovery task is defined as follows: 
given a population of individuals and a property of those individuals we are 
interested in, find the subgroups of the population that are statistically “most 
interesting”, i.e., are as large as possible and have the most unusual statistical 
(distributional) characteristics with respect to the property of interest. The sy- 
stem Midos m guides the top-down search of potentially interesting subgroups 
using numerous user-defined parameters. 

Learning qualitative models of dynamic systems. The automated con- 
struction of models of dynamic system may be aimed at qualitative model disco- 
very. A recent qualitative model discovery system m, using a Qsim-like repre- 
sentation, is based on Coiera’s Genmodel to which signal processing capabilities 
have been added. 

Equation discovery. The system LAGRANGE [T^ discovers a set of dif- 
ferential equations from an example behavior of a dynamic system. Example 
behaviors are specified by lists of measurements of a set of system variables, 
and background knowledge predicates enable the introduction of new variables 
as time derivatives, sines or cosines of system variables. New variables can be 
further introduced by multiplication. 
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2.5 Influence of Different Research Areas on ILP 

The research areas that have strongly influenced ILP are machine learning, com- 
putational learning theory, computational logic and, recently, knowledge disco- 
very in databases. 

Machine learning. ILP has its roots in machine learning, and most of ILP 
researchers have done machine learning research before entering ILP. Machine 
learning has always provided the basic research philosophy where experimental 
evaluation and applications play a key role in the development of novel techniques 
and tools. This was the case in the early days of ILP as well as now in its mature 
stage. 

Computational learning theory. COLT has helped ILP to better under- 
stand the learnability issues and provided some basic learning algorithms that 
were studied and adapted for the needs of ILP. However, COLT has not influ- 
enced ILP research as strongly as other areas reviewed in this section. 

Computational logic. Computational logic and logic programming in par- 
ticular played an extremely important role in early ILP research. Besides provi- 
ding a framework that helped to develop of the theory of ILP, it provided the 
well studied representational formalisms and an initially challenging application 
area of program synthesis. Due to the difficulty of this application task that 
can not be solved without very strong biases and restrictions on the hypothesis 
language, program synthesis has become substantially less popular in the recent 
years. 

Knowledge discovery in databases. The role that computational logic 
played in the early days of ILP has recently to a large extent been taken by 
knowledge discovery in databases (KDD) and data mining. ILP is namely very 
well suited in Ailing the gap between the current KDD tools capable of dealing 
with single data tables, and the need of dealing with large data collections stored 
in multiple relational tables. The influence of KDD on ILP is expected to grow 
substantially in the coming years. 



3 Future Challenges for ILP 

This section first presents the connections among the areas of Computational 
Logic that may prove to be fruitful in future research on ILP, some application 
challenges for ILP, and continues with technological advances that will be needed 
to be able to deal with these challenges. Challenges addressed as short-term 
research orientation for ILP are complemented with longer-term challenges for 
ILP and machine learning/KDD in general. Long-term challenges namely address 
the problem of dealing with a variety of data sources including images, sound 
and video inputs. 
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3.1 Challenging Application Areas 
Application Areas Related to Computational Logic 

Compulog Net has identified some of the main research directions for com- 
putational logic, having in mind their industrial relevance in solving real-life 
problems. These areas include inductive logic programming (ILP), constraint lo- 
gic programming (CLP) and natural language processing (NLP). The identified 
areas can be viewed as challenge future research directions for ILP. 

Constraint logic programming. As shown in the overview of techniques 
for predictive ILP in Section \M the connection between ILP and CLP has 
already been established through the work on Inductive Constraint Logic Pro- 
gramming. ILP has recognized the potential of CLP number-handling and of the 
CLP constraint solving to do part of the search involved in inductive learning. 
Early work in this area by Page and Frisch, Mizoguchi and Ohwada in 1992, 
and recent work by Sebag and Rouveirol [HH] show the potential of merging ILP 
and CLP that has to be explored to a larger extent in the future. Due to the 
industrial relevance of these two areas of computational logic it is expected that 
the developments at their intersection may result in products of great industrial 
benefit. It is unfortunate to note, however, that (to the best of my knowledge) 
no systematic effort in joining the research efforts of groups in ILP and CLP is 
envisioned in the near future. 

Natural language processing. Stephen Muggleton has recently proposed 
to devote substantial research efforts into the research area Learning Language in 
Logic (LLL). The intention is to apply ILP in various areas of natural language 
processing, including the syntactical and morphological analysis of corpus data, 
using the existing large grammars as background knowledge. To this end, the 
first LLL workshop will be organized in summer 1999 in Bled, Slovenia, and 
a project proposal submitted to the 5th Framework proposed by some of the 
partners of the ILP2 project consortium and the most important NLP research 
groups. 

Abduction. Other initiatives among areas of computational logic have re- 
cently also identified the potential for mutual benefits. One of these is the Abduc- 
tion and Induction initiative of Peter Flach and Antonis Kakas that resulted in 
a series of workshops on this topic and the fortcoming edited volume Abduction 
and Induction: Essays on their Relation and Integration [25] . Early research in 
this direction by Luc De Raedt (the system CLINT) and more recent work by 
Dimopoulos and Kakas m show the potential for the merging of these techno- 
logies. A new ILP framework and system, called ACL j32] for abductive concept 
learning has been developed and used to study the problems of learning from 
incomplete background data and of multiple predicate learning. More work in 
this area is expected in the future. 

Higher-order logic. Some work towards the use of higher-order reasoning 
and the use of functional languages has also started recently, in particular using 
the declarative programming language Escher (facilitating the use of higher- 
order features) for learning and hypothesis representation [28] . This work may 
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be a first step towards a larger research initiative in using higher-order features 
in ILP. 

Deductive databases. A tighter connection with deductive database tech- 
nology has been recently advocated by Luc De Raedt EoETi introducing an 
inductive database mining query language that integrates concepts from ILP, 
CLP, deductive databases and meta-programming into a flexible environment 
for relational knowledge discovery in databases. Since the primitives of the lan- 
guage can easily be combined with Prolog, complex systems and behaviour can 
be specified declaratively. This type of integration of concepts from different 
areas of computational logic can prove extremely beneficial for ILP in the fu- 
ture. It can lead to a novel ILP paradigm of inductive logic programming query 
languages whose usefulness may be proved to be similar to those of constraint 
logic programming. 

Other LP based advances. The use of more advanced LP techniques may 
be due to the recent interest of LP researchers that have already contributed 
to the use of a more expressive formalism which enables learning together the 
positive and negative part of a concept where the learned theory is an extended 
logic programs with classical negation (as opposed to normal programs) El 
I37138| | where a potential inconsistency in such a theory is resolved by learning 
priorities amongst contradictory rules. 

A lot of work on LP semantics in the past twelve years, culminating in the 
definition of well-founded semantics and stable model semantics, and subsequent 
elaborations could be considered in future ILP research, since they allow dea- 
ling with non-stratified programs, and 3- valuedness. A lot of work on knowledge 
representation and non-monotonic reasoning has been developed using such se- 
mantical basis. Also, recent work on constructive negation would allow inducing 
rules without fear of floundering, and generating exceptions to default negations 
which could then be generalized. 

Argumentation semantics and procedures are also likely to be useful for com- 
posing rules learnt separately from several sources, algorithms, strategies. 

The work in LP on preferences | 5I6| is bound to be of interest when combining 
rules, and even more so because user preferences might be learnt form instances 
of user choice and rejection. This may turn out to be crucial for information 
gathering on the basis of user preferences. Fuzzy LP may become important in 
the future for fuzzyfying such induced preference rules, as well as generalized an- 
notated programs (GAPs) [Ml which allow for different degrees of contradiction 
to be expressed. 

Moreover, the implement ational techniques of tabling in LP have matured 
and prove quite useful m- In ILP they may save a lot of recomputation because 
results are memoized in an efficient way. Indeed, in ILP each time a clause 
is abstracted or refined it has to be tested again with the evidence, though 
many literals in the clause, and surely the background, are the same, so that 
repetition of the same computation occurs. This is even more important when 
learnt programs become deeper, i.e. not shallow. 
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Other Challenging Application Areas for ILP 

Molecular biology. At present, molecular biology applications of ILP have 
come closest to practical relevance. Among the first was predicting protein se- 
condary structure prediction, followed by predicting drug activity through mo- 
deling structure-activity relations and predicting the mutagenicity of aromatic 
and heteroaromatic nitro-compounds. In these problems, which are of immediate 
practical interest, results that are better or equal to the best previously known 
results have been obtained, in addition to understandable and relevant new kno- 
wledge. Recent ILP applications in the area of molecular biology include pre- 
diction of rodent carcinogenicity bioassays, modeling structure-activity relations 
for modulating transmembrane calcium movement, pharmacophore discovery for 
ACE inhibition and diterpene structure elucidation. At present, PDB/SCOP and 
SwissProt data sources can be used for data analysis. In the future there is a 
lot of potential for ILP applications using data of the human genome project, 
producing excessive amounts of data that needs to be analysed. 

Agents, personalized SW applications and skill acquisition. Internet 
is an immense source of information and data analysis of internet resources is 
one of the greatest present challenges. Internet data analysis agents are thus one 
of the big challenges for future research. This challenge is further explained and 
elaborated in Section 13.21 under the header ‘Continuous learning from global 
datasets’. In addition, personalized SW agents will be developed, by learning 
from past user behaviour (e.g., user’s habits when browsing the web). There are 
also challenging skill acquisition applications, including control tasks, such as 
learning from simulation data to learn the operational skill needed for control- 
ling an aircraft, or the robots skills needed for playing footbal in a RoboCup 
competition. 

Medical consultants. In the fifth framework documents medicine is poin- 
ted out as an important issue in making the information society more friendly to 
the citizen. In this sense, learning and answering queries from medical resources 
available on the Web will become increasingly important. This involves teleme- 
dicine for home monitoring of patients with chronic diseases, as well as ‘second 
opinion’ consultancy for citizens when dealing with their daily health problems, 
and experts when dealing with rare and difficult cases. 

Other challenging applications. Much of future research is expected in 
information retrieval and text mining, analysis of music and multimedia data 
mining, as well as relational knowledge discovery applications in finance, e- 
commerce, banking, ecology, and many others. 



3.2 Challenges in ILP Research 
Short-Term Research Issues 

The following advancements of ILP will be needed to successfully deal with 
the above application challenges. 
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ILP as a methodology for first-order learning. As shown in Section [2^ 
ILP has already developed numerous useful techniques for relational knowledge 
discovery. A recent research trend in ILP is to develop algorithms implementing 
all the most popular machine learning techniques in the first-order framework. 
Already developed techniques upgrading propositional learning algorithms in- 
clude first-order decision tree learning , first-order clustering mm, relational 
genetic algorithms [^, first-order instance-based learning m\, first-order rein- 
forcement learning m and first-order Bayesian classifier m- It is expected that 
the adaptation of propositional machine learning algorithms to the first-order 
framework will continue also in the areas for which first-order implementations 
still do not exist. This should provide a full scale methodology for relational 
data mining based on future ILP implementations of first-order Bayesian net- 
works, first-order neural networks, possibly first-order fuzzy systems and other 
ILP upgrades of propositional machine learning techniques. 

Improved robustness and scaling- up of ILP algorithms. This involves 
the development of robust learning algorithms (w.r.t. noise, missing informa- 
tion, ...), the development of standards for data and knowledge representation, 
standards for parameter settings, on-line transformers between different data 
formats, improved efficiency of learners, and the capacity of dealing with large 
datasets. 

Multi-strategy learning and integration. The present data mining ap- 
plications already require data analysis to be performed by different machine 
learning algorithms, aimed at achieving best learning results. Multistrategy lear- 
ning has shown that best results can be achieved by a combination of learning 
algorithms or by combining the results of multiple learners. Current simple and 
popular approaches involve the well-known bagging and boosting that employ 
redundancy to achieve better classification accuracy |4I29I54| . 

More sophisticated approaches will require the integration of different lear- 
ners into KDD tools, standard statistical tools and SW packages (like EXCELL) 
and into SW packages standardly used in particular applications. Integrated 
machine learning will have to be based also on a better understanding of the 
different types of problem domains and characteristics of learning algorithms 
best suited for the given data characteristics. 

Mixing of different rules by the use of LP techniques also allows for combi- 
ning multi-strategy and multi-source learning in a declarative way. Some of the 
existing techniques are inspired on contradiction removal methods originated 
in LP, others rely on recent work on updating LP programs with one another 
[I]. LP based combination techniques may become more important in the near 
future. 

Hierarchically structured learning and predicate invention. Learning 
from ‘flat’ datasets nowadays typically results in ‘flat’ hypotheses that involve 
no intermediate structure and no constructive induction/predicate invention. 
Despite substantial research efforts in this area challenging results can still be 
expected. 
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Criteria for the evaluation of hypotheses. Except for the standard ac- 
curacy measure, other evaluation measures need to be developed. This is of par- 
ticular importance for descriptive ILP systems that lack such measures for the 
evaluation of results. Measures of similarity, distance measures, interestingness, 
precision, measures for outlier detection, irrelevance, and other heuristic criteria 
need to be studied and incorporated into ILP algorithms. 

Criteria for the relevance of background knowledge. Background kno- 
wledge and previously learned predicate definitions should be stored for further 
learning in selected problem areas. One should be aware, however, that an in- 
creased volume of background knowledge may have also undesirable properties: 
not only that learning will become less efficient, but given irrelevant information 
the results of learning will be less accurate. Therefore it is crucial to formulate 
criteria for evaluating the relevance of background knowledge predicates before 
they are allowed to become part of a library of bakground knowledge predicates 
for a given application area. 

Learning from temporal data. ILP is to some extent able to deal with 
temporal information. However, specialized constructs should be developed for 
applications in which the analysis of a current stream of time labelled data re- 
presents an input to ILP. Experience from the area of temporal data abstraction 
could be used to construct higher-level predicates summarizing temporal pheno- 
mena. 



Long-Term Research Issues 

Some of the issues discussed in this section are relevant to ILP only, whereas 
others are relevant to machine learning in general. Some of these issues have 
been pointed out as important already by Tom Mitchell in the article published 
in the Fall 1997 issue of the AI Magazine |43| . 

Analysis of comprehensibility. It is often claimed that for many appli- 
cations comprehensibility is the main factor if the results of learning are to be 
accepted by the experts. Despite these claims and some initial investigations 
of intelligibility criteria for symbolic machine learning (such as the standard 
Occam’s razor and minimal description length criteria) there are few research 
results concerning the intelligibility evaluation by humans. 

Building specialized learners and data libraries. Particular problem 
areas have particular characteristics and requirements, and not all learning al- 
gorithms are capable of dealing with these. This is a reason for starting to build 
specialized learners for different types of applications. In addition, libraries of 
‘cleaned’ data, background knowledge and previously learned predicate definiti- 
ons should be stored for further learning in selected problem areas. Notice that 
such libraries are currently being established for selected problem areas in mole- 
cular biology. This approach will lead to the reusability of components and to 
extended example sets that should be achieved also as the result of systematic 
query answering and experimentation that is part of ‘continuous’ learning as 
discussed below. 
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Continuous learning from ‘global’ datasets. Under this heading we un- 
derstand the requirement for learning from various data sources, where data 
sources can be of various types, including propositional and relational tables, 
textual data, and hypermedia data including speech, images and video. This 
involves also the issue of globality, i.e., learning from local datasets as well as 
referenda! datasets collected and maintained by the world’s best experts in the 
area, referential case bases of ‘outlier’ data as well as data that is publicly avai- 
lable via WWW. Achieving the requirement of continuous and global learning 
will require also learning agents for permanent learning by theory revision from 
updated world-wide data, as well as the development of query agents that will be 
able to access additional information from WWW via query answering (invoked 
either by experts or by automatically extracting answers from WWW resources, 
possibly by invoking learning and active experimentation). Query agents may 
involve dynamic abductive querying on WWW. 
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Abstract. With the ever growing usage of the world wide IT networks, 
agent technologies and multiagent systems (MAS) are attracting more 
and more attention. Multiagent systems are designed to be open systems, 
wheras agent technologies aim at the design of agents that perform well 
in environments that are not necessarily well-structured and benevolent. 
Emergent system behaviour is one of the most interesting phenomena 
one can investigate in MAS. However, there is more to MAS design 
than the interaction between a number of agents. For an effective sy- 
stem behaviour we need structure and organisation. This paper presents 
basic concepts of a theory for holonic multiagent systems with the aim 
to define the building blocks of a theory that can explain organisation 
and dynamic reorganisation in MAS. In doing so it tries to bridge the 
well-known micro-macro gap in MAS theories. The basic concepts are il- 
lustrated with three application scenarios: flexible manufacturing, order 
dispatching in haulage companies, and train coupling and sharing. 



1 Introduction 

The increasing importance of the world wide telecommunication and computer 
networks, especially the Internet and the World Wide Web (WWW) is one of 
the reasons why agent technologies have attracted so much attention in the past 
few years. Although in many of todays applications individual agents are trying 
to fulfil a task on behalf of a single user, these agents are doing so in a multiagent 
context. It is obvious that the problem solving capabilities of multiagent systems 
(MAS), which have been developed since research on MAS began in the late 70s, 
will become more and more important. However, the implementation of MAS 
for interesting real-world application scenarios still tend to be very complex. 
The basic approach to tackling this complexity is to base problem solving on 
emerging bottom-up behaviour. This is achieved by giving the agents specific 
abilities which leads to emergent problem solving behaviours when the agents 
interact with each other. Although this approach works well in many cases, the 
solutions tend to be sub-optimal. 

With his Watchmaker’s parable Simon demonstrated that a hierarchy offers 
another useful paradigm for tackling complexity m- The hierarchical solution to 
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a global problem is built up from modules which form stable sub-solutions, allo- 
wing one to construct a complex system out of less complex components. Control 
in such a hierarchy can be designed in a centralised or a decentralised manner. 
The decentralised model offers robustness and agility with respect to uncertain- 
ties in task execution. The major advantages of the introduction of centralised 
planning and control instances are predictability, opportunities for performance 
optimisation, and an easier migration path from current to distributed systems 

E- 

To design and implement systems that include both hierachical organisational 
structures as well as decentralised control the concepts of fractal and holonic 
design were proposed USE]. The word holon CH is derived from the Greek holos 
(whole) and the suffix on, which means particle or part. A holon is a natural 
or artificial structure that is stable, coherent, and consists of several holons as 
substructures. No natural structure is either whole or part in an absolute sense. 
A holon is a complex whole that consists of substructures as well as being a 
part of a larger entity. In both approaches, in fractal as well as holonic design, 
we have the ideas of recursively nested self-similar structures which dynamically 
adapt themselves to achieve the design goals of the system. We adopt the notion 
of holonic multiagent systems to transfer these ideas to the design of MAS. In 
a holonic MAS autonomous agents group together to form holons. However, in 
doing so they do not loose their autonomy completely. The agents can leave 
a holon again and act autonomously or rearrange themselves as new holons. 
According to this view a holonic agent consists of sub-agents, which can separate 
and rearrange themselves and which may themselves be holons. 

The paper starts by presenting some basic definitions of how holonic struc- 
tures can be identified in a MAS and how these structures are organised. The 
usefulness of proposed concepts is then demonstrated by three application scena- 
rios: (1) flexible manufacturing, (2) order dispatching in haulage companies, and 
(3) train coupling and sharing. These application scenarios differ in the sense 
that in (1) holons are formed dynamically because agents with different abilities 
have to work together to achieve a common goal. In (2) the abilities of the agents 
partly overlap. Last but not least, in (3) all agents have the same abilities. 



2 Definition of Holonic Multiagent Systems 

To describe a MAS for a given problem, a set of cooperating agents is speci- 
fied. The static description of the MAS is given by the prototype agent system 
j^Spj-ot • — , Allis') , where 

Aprot is the set (A^, . . ,A^,n G N of prototypical agents, the agents which 
may arise dynamically in the system. These agents are the potentially 
available problem solvers. Several instances of a specific prototypical 
agent can be created. 

is a specialised prototypical agent providing an agent directory service. 
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Fig. 1. Static specification ASprot of the solution for a problem and dynamic execution 

The process of problem solving starts with the initial agent system 
init ADSinit) where 

Ainit = {A\, . . . , . Ai , . . . , A^^ ), kit ■ ■ ■ kn G N and 

Vi,j €N: A^j e A^nit ■■ A^ o A) A A^ G Aprot- 

A\> A' denotes that A' is an instance of the prototypical agent A which means 
that A' inherits its behaviour and initial knowledge from A but gets also some 
additional knowledge like for example its unique identification which can be 
used as an address to communicate with A' . From ASinn the dynamic MAS 
ASt evolves 

ASt = {A^ , ADSt) where 

At = {AY, A]f, . . . AY, . . . , AY)Ji, G N and 

\/iJ G N : Af G At : A* ► A}’* A A* G Aprot- 

► denotes [> o which means that we have A* [> A* where A* G Amit 
and A* A*’* where dentotes the transformation of A* by a single step of 
computation. 
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The computation goes on while the agents send messages to each other. 
New agents may be introduced and some of the active agents might get killed. 
Each of the agent has a unique identification, i.e. address, which can be used in 
messages, thus allows any desired organisation of the MAS to be achieved. All 
agents automatically know the the identification of the ADS agent. An agent 
can make its own identification accessible to all of the other agents by registering 
at the ADS agent. In the example of Fig. [I]the system started with ASinit = 
{{A\, Ai), ADSinit)- A\ creates Af and Af and A| creates Af. At time t, A\ 
told Af the identification of Af and Af told Af its own identification. Af got the 
identification of Af because Af registered its identification at the ADS agent 
and Af extracted this information from the ADS agent. 

At any given point in time we can identify holonic structures in the dynamic 
MAS ASt- A holon H behaves in its environment like any of the agents in At- 
However, when we take a closer look it might turn out that H is built up by 
a set of holons itself. Let atomic : "H — ?► B be a function that tells us whether 
a given holon H is built up of other holons or whether H £ At- For the set of 
holons Ti we then have: 



WHGn-- 



atomic(iL) : 
-■atomic(iJ) : 



H £ At 

'ih,£H:hi£'H 



( 1 ) 



Like any agent a holon has a unique identification so that it is possible to 
communicate with the holon by just sending messages to its address. In MAS 
the modelling of holonic agents can be realised by commitments (see e.g. [14]) 
of the subagents to cooperate and work towards a common goal in a corporated 
way. We thus investigate holonic structures that are formed because the entities 
in a holon have to co-operatively solve a set of tasks. Because of this common 
interest the entities form shared goals and shared intentions mM- We use the 
term shared because we do not want to go into the details of the discussion 
whether joint knolwedge can or cannot be achieved in a MAS [7j. We assume 
that shared means that the agents are aware of the fact that other agents are 
involved and that these agents maintain knowledge, belief, goals, and intentions 
about the goals and intentions that are believed to be shared. More formally 
we can say that all sub-holons hi,i £ N of a holon h commit themselves to 
co-operatively execute action a where each of the hi performs action and we 
assume that at any point in time t 

yy succeeded(oi, t) — >■ succeeded(a, t) (2) 

hi£h 

holds. To execute the action a is a shared intention of the sub-holons hi of h. 
Because of © we have 



yy a.chie'ved{intent{execute{ai),t)) &chieved{intent{execute{a) A)) (3) 

hiGh 
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What makes things difficult to further describe the holonic structures is that 
the tasks which have to be executed by the agents get known in the system 
some time before the point at which the execution starts. This implies that 
the agents have to make commitments on participating in a holon at a point 
in time at which the execution of the task lies in the future. This means that 
we can distinguish between the period in which the agents negotiate about in 
which holon they want to participate and the actual execution of the tasks. 
However, this distinction does not have any influence on the actual structure of 
the holon, since the holonic structure is formed when the commitments for the 
shared intentions are made. 

For the settings in this paper a new holon is formed as soon at least two agents 
commit themselves to co-operatively perform an action. The holonic structure 
remains at least until either the joint action is performed or the commitment to 
do the joint action is retracted. 

Although it would be possible to organise the holonic structures in a com- 
pletely decentralised manner, for efficiency reasons it is more effective to use an 
individual agent to represent a holon. In some cases it is possible to select one of 
the already present agents as the representative of the holon by using an election 
procedure. In other cases a new agent is explicitly introduced to represent the 
holon just for its lifetime. In both cases the representative agent has to have the 
ability to represent the shared intentions of the holon and to negotiate about 
these intentions with the outside world and with the agents internal to the holon. 

3 Applications 

In this section we illustrate the formation of shared intentions in three application 
scenarios: (1) flexible manufacturing, (2) order dispatching in haulage companies, 
and (3) train coupling and sharing. These application scenarios differ in the sense 
that in (1) holons are formed dynamically because agents with different abilities 
have to work together to achieve a common goal. In (2) the abilities of the agents 
partly overlap. Last but not least, in (3) all agents have the same abilities. 

3.1 Holonic Structures in a Flexible Manufacturing System 

There are already well-established layers of abstraction in the control of a fle- 
xible manufacturing system (FMS) (see Fig.E]): production planning and control 
(PPG), shop floor control (SFC), flexible cell control (FCC), autonomous system 
control, and machine control. Each of these layers has a clearly defined scope of 
competence. In Fig. [5] we can see holons on each of the five layers: at the lowest 
layer, the physical body of an autonomous system (i.e. an autonomous robot or 
a machine tool) together with its controlling agent. On the layer of the flexible 
cells we have the flexible cell control system together with the holons that are 
formed by the physical systems that belong to the flexible cell. On the SFC 
layer we have the agent that represents the SFC for a specific production unit 
together with all the holons that belong to it. Finally, on the enterprise layer we 
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Fig. 2. Planning and control layers in a flexible manufacturing system. 



have all the holons that are present at a specific site of the company. We can 
even go further and represent holons of companies with several sites. However, 
in this paper we do not want to further elaborate on this. Most of the holonic 
structures which were just described are quite stable. However, especially on the 
layer of the flexible cells it is very important to have efficient mechanisms to 
dynamically form new holons. With the description of this situation we have 
the conceptual problem that autonomous systems such as mobile robots might 
interact with flexible cells. To have a more uniform view we assume that pure 
autonomous systems such as mobile robots and autonomous (guided) vehicles 
are represented as holons on the FCC layer, too. We refer to all these systems 
as flexible cell holons (FCH). 

The SFC system passes tasks to the lower FCC layer as soon as it is de- 
termined by the production plan that a task can be executed because all of 
the preceding steps in the working plan have been successfully completed. From 
these tasks the FCHs on the lower layers derive their shared and local intentions. 
The SFC system does not care whether it is possible for a group of FCHs to exe- 
cute this task immediately or if they are currently engaged in the execution of a 
task. The SFC system just inserts the task into a list which is accessible to all of 
the FCHs and the FCHs decide by themselves when it will actually be executed. 
By doing a specific task, several FCHs have to co-operate. Each FCH has to 
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play a part to solve a specific task. No FCH may believe that it is the only one 
that wants to play a certain part for a specific task. Therefore, the FCHs must 
co-ordinate their intentions to play parts in different tasks. The main problem to 
be solved is to find a consistent group of FCHs which together are able to solve 
a specific task. We call such a group a complete holon for a task. Only tasks for 
which a complete holon is formed can actually be executed. 

The FCHs can be separated into three groups: TZ mobile manipulation sy- 
stems (mobile robots), T transport systems, and C flexible cells such as ma- 
chining centres that might have locally fixed robots. In some settings even the 
workpieces which are to be processed are able to move autonomously, for ex- 
ample when they are installed on a transportation device. In these settings it is 
reasonable to control these workpieces by FCHs. We therefore introduce the set 
of workpieces W. 

Mobile manipulation systems are able to work flexibly on the given tasks. 
Each time a mobile manipulation system finishes the execution of a task it can 
start working on any task it is able to regardless of its current configuration. Lo- 
cally fixed robots, machining centres, and flexible cells are much more restricted 
in their ability to choose tasks to be executed than FCHs in TZUT because FCHs 
in C have a fixed location. An FCH f in C depends on FCHs of 7^ U T if all the 
devices needed for a specific task are not already present within /. We therefore 
introduce the precedence relations W^C^T^TZ-T -<TZ means that a mem- 
ber of TZ may only join a holon for a specific task if all of the members of set 
T have already joined the holon for this specific task. The precedence relation 
^ is transitive which means that, for example, W ^ 7?. is valid too. The idea 
behind this definition is that the FCHs which are able to execute tasks flexibly 
may react to the decisions of FCHs which lack this flexibility in task execution. 

To find a complete holon, the FCHs examine the list of tasks, which are 
announced by the SFC system, and try to reserve the task they would like to 
execute next for themselves. When an FCH is able to reserve a task successfully 
for itself, this FCH becomes the representative of the holon for this task. The 
representative r of a holon for a task t has responibility to complete the holon 
for this task, r does this by sending messages to the other FCHs which ask these 
FCHs to join the holon for task t. A conflict occurs if two representatives send 
each other messages in which each of them asks the other one to join its own 
holon. It is possible to describe conflict resolution protocols for this situation 
which guarantee liveness and fairness of the system. 

3.2 TeleTruck 

Order dispatching in haulage companies is a complex task. For a system solving 
the real-world problem it is not enough to provide a system that computes routes 
for a fleet of trucks for a given set of customer orders. A system supporting the 
real world setting has to cope with an online scheduling problem, in which at 
any point in time new orders can arrive and in which the system is able to react 
to problems in the exection of the computed plans. The TeleTruck system 
(see Fig. [S]) implements an online dispatching systems using telecommunication 
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Fig. 3. The online fleet scheduling system TeleTruck. 



technologies (e.g., satellite and mobile phone communication and global positio- 
ning) . demonstrated that a MAS approach to model an online dispatching 
system is feasible and and can compete with operation research approaches with 
respect to quality of the provided solution. However, in these scientific settings 
the transportation is done by self-contained entities. In practice we see that 
truck drivers, trucks, and (semi-)trailors are autonomous entities with their own 
objectives. Only an appropriate group of these entities can together perform the 
transportation task. For this reason a holonic aproach had to be used to model 
the agent society of TeleTruck|3]. 

For each of the physical components (trucks, truck tracktors, chassis, and 
(semi-)trailors) of the forwarding company as well as for each of its drivers 
there is an agent, which administrates the resources the component or the driver 
supplies. These agents have their own plans, goals, and communication facilities 
in order to provide their resources for the transportation plans according to 
their role in the society. The agents have to form appropriate holons in order to 
execute the orders at hand. 

Building a new holon is not just about collecting the needed resources. The 
components that merge to a holon have to complement each other and match 
the requirements of the transportation task. For each component a incompatibi- 
lity list is represented that specifies the incompatibilities to other components, 
properties of components or orders. These constraints represent technical and 
legal restrictions and demands for holons. 
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The main things that need to be agreed between agents participating in a 
vehicle holon are to go to a specific place at a specific point in time and to 
load and unload goods. From these activities shared intentions for the agents 
participating in the vehicle holon can be derived. An new agent representing 
a Plan’n’ Execute Units (PnEUs) for the vehicle holon is explicitly introduced 
to maintain the shared intentions of the vehicle holon. The PnEU coordinates 
the formation of the holon representing the transportation entity and plans the 
vehicle’s routes, loading stops, and driving times. The PnEU represents the 
transportation holon to the outside and is authorised to reconfigure it. A PnEU 
is equipped with planning, coordination, and communication abilities, but does 
not have its own resources. Each transportation holon that has at least one task 
to do is headed by such a PnEU. Additionally, there is always exactly one idle 
PnEU with an empty plan that coordinates the formation of a new holon from 
idle components if needed. 

For the assignment of the orders to the vehicle holons a bidding procedure is 
used . The dispatch officer in the shipping company interacts with a dispatch 
agent. The dispatch agent announces the newly incoming orders, specified by 
the dispatch officer, to the PnEUs via an extended contract net protocol (ECNP) 
1^. The PnEUs request resources from their components and decide whether 
the resources are sufficient to fulfil the task or not. If they are sufficient, the 
PnEU computes a plan, calculates its costs, and bids for the task. If the resour- 
ces supplied by the components that are already member of the holon are not 
sufficient — which is trivially the case for the idle PnEU — the task together with 
the list of missing resources and a set of constraints that the order or the other 
members of the holon induce is announced to those agents which could supply 
such resources. These agents calculate which of their resources they can actually 
supply, and again announce the task and the still missing resources. This is 
iterated until all the needed resources are collected. The task of collecting the 
needed resources is not totally left to the PnEU because the components have 
local knowledge about how they can be combined, e.g. if a driver always drives 
the same truck it is local knowledge of the components and not of the PnEU. 

Thus the ECNP is used on the one hand by the dispatch agent to allocate 
tasks to the existing vehicle holons and on the other hand by the free PnEU 
which, itself, uses the protocol to form a new holon. The ECNP the dispatch 
agent initiates serves to allocate a transportation order to one or more vehicles. 
The semantics of the announcement is the invitation to the vehicles to bid for and 
execute a task. Fig. 0 shows a dispatch agent announcing a new transportation 
task to two vehicle holons and the idle PnEU. The complete vehicle holon on 
the left hand side cannot incorporate any further components. Hence, the head 
requests for the necessary resources from its components and, if these resources 
are sufficient, calculates the costs for the execution of the task. The second 
vehicle holon could integrate one further component. However, its head first tries 
to plan the task using only the resources of the components, the holon already 
has incorporated. If the resources are not sufficient, the head tries to collect the 
missing resources by performing an ECNP with idle components that supply 
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Fig. 4. Holonic Planning in TeleTruck. 



such resources. The idle PnEU, which has not yet any resources on itsown, first 
of all performs an ENCP with those idle components that offer loading space; 
in the example a truck and a trailer. The trailer supplies loading space and 
chassis, therefore, it needs a motor supplying component. Hence, it announces 
the task to the truck. The truck which received two different announcements 
for the same task — one by the trailer and one by the PnEU directly — can bid 
in both protocols since it can be sure that only one of the protocols will be 
successful. Therefore, the truck agent looks for a driver, computes the costs for 
the two different announcements, and gives a bid both to the PnEU and to the 
trailer. Obviously, the costs for executing the task with a vehicle that consists 
of a driver and a truck are less than the costs of executing the same task with 
the same truck and driver and, in addition, a trailer. Therefore, the idle PnEU 
will pass the bid of the truck to the dispatch agent. If the task is granted to the 
idle PnEU, the PnEU merges with the components to a vehicle holon and a new 
PnEU will be created for further bidding cycles. Whenever the plan of a holon 
is finished the components separate and the PnEU terminates. 

Because the tour plans that are computed in the ECNP procedure are sub- 
optimal m. the simulated trading procedure [I] is be used to improve the sub- 
optimal initial solution stepwise towards globally optimal plans |H]. Simulated 
trading is a randomised algorithm that realises a market mechanism where the 
vehicles optimise their plans by successively selling and buying tasks. Trading is 
done in several rounds. Each round consists of a number of decision cycles. In 
each cycle the truck agents submit one offer to sell or buy a task. At the end of 
each round the dispatch agent tries to match the sell and buy offers of the trucks 
such that the costs of the global solution decrease. This implements a kind of 
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hill-climbing algorithm. Like in the case of simulated annealing, a derivation that 
decreases from round to round can be specified such that in early rounds the 
dispatch agent is willing to accept a worsening of the global solution which is 
helpful to leave local maxima in the solution space. Nevertheless, maxima that 
are left are saved, such that, when the algorithm terminates before a better so- 
lution is found, the best solution hitherto is returned. Hence, simulated trading 
is an interruptible anytime algorithm. 

In order to allow the optimisation not only of the plans but also of the com- 
bination of components we extended the simulated trading procedure. It might 
be the case that a good route plan is not efficient because the allocation of re- 
sources to the plan is bad, e.g. a big truck is not full while a smaller truck could 
need some extra capacity to improve its own plan. We divided a trading round 
into three phases. The first phase consists of order trading cycles as explained 
above; in the middle phase the holons can submit offers to exchange components. 
The third phase is, like the first phase, an order trading phase. After the third 
phase is finished the dispatch agent matches the sell and buy and the compo- 
nent exchange offers. This final trading phase is needed to decide whether the 
exchange of components in the middle phase actually lead to an improvement 
of the global resource allocation. 



3.3 Trans Coupling and Sharing 

In this application scenario we have a set of train modules that able to drive on 
their own on a railway network. However, if all the train modules drive separately, 
the capacity utilisation of the railway network is not acceptable. The idea is that 
the module trains join together and jointly drive some distance {train coupling 
and sharing (TCS) [12], see Fig. E|). The overall goal is to reduce the cost for a 
given set of transportation tasks in a railroad network. Each task is specified as 
a tuple consisting of the origin and the destination node, the earliest possible 
departure time and the latest allowed arrival time. As in the TeleTruck system 
not all of the tasks are announced to the system at the same time. New tasks 
can come in at any point in time. 

We assume that a set of transportation tasks is given, which have to be 
served by the train modules, and that each task can be served by an individual 
module, i.e. there is no need to hook two or more modules together to serve an 
individual task. Likewise, we also assume that a module cannot serve more than 
one task at a time. All tasks occurring in the system are transportation requests 
in a railroad network, which is represented as a graph consisting of several nodes 
connected via so-called location routes. 

Whenever a module serves a transportation task, it computes the path from 
the origin to the destination node with a shortest path algorithm. The module 
then rents the intermediate location routes for a certain time window from the 
net manager. The time window for each location route is uniquely determined 
by the earliest departure time and the latest arrival time of the transportation 
task. When a location route is allocated by a certain module, the route is blocked 
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Fig. 5. Routing of Trains in the TCS System 



for other modules during this time interval. In order to reduce route blocking, 
however, two or more modules can decide to share a particular location route. 



Route sharing means, that two or more modules hook together at the be- 
ginning of a location route (or of a sequence of consecutive routes) and split up 
afterwards. Route sharing has two advantages: firstly, it increases the average 
utilisation of location routes because it enables more than one module to use a 
location route at the same time. Secondly, the cost for renting a location route 
are reduced for an individual module by distributing the full cost among the 
participating modules. 



We have two natural candidates for becoming the agents in the TCS scenario: 
the individual modules and the unions (see Fig. 6 and Fig. 7) that emerge when 
two or more modules decide to share a location route. However, each additional 
abstraction increases the complexity of the resulting implementation and there- 
fore, we have decided to model the unions as the agents in our system. A single 
module is encapsulated in a (so-called degenerated) union and thus we avoid 
the additional complexity in the system design. The advantage of applying this 
scheme is that we do not have to distinguish modules and unions; every active 
entity in the system is a union. 
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Fig. 6. Train modules group 
together in unions. 



Fig. 7. Even unions can group and 

form recursively nested holons. 



The main reason for the unions to form shared intentions is that they need 
to share location routes to reduce cost. This peer matching is again achieed by 
negotiation processes between the agents in the agent society. As in the Te- 
leTruck system the contract-net protocol [B] is used whenever a new task is 
announced to the system. New tasks are incrementally integrated in the existing 
scheduling which guarantees, that always a solution for the problem (as far as it 
is known to the system) exists. The contract-net protocol is executed whenever 
a new (degenerated) union has computed its local plan. The union then initiates 
the contract-net protocol as the manager and offers the plan to the other cur- 
rently active unions. These unions check if they contain one or more modules 
that are a potential sharing peers and if this is the case, they offer a sharing 
commitment to the new union. The new union collects these offers and selects 
the one that has the largest cost saving potential. It then transfers the module 
to the winning union and ceases to exist because it does not contain other mo- 
dules. If no union offers a sharing commitment, the new union remains active as 
degenerated union. However, as in the TeleTruck system, this solution may 
be (and usually is) not optimal. In order to improve the quality of the existing 
solution, the simulated trading jl] procedure is run on the set of tasks (or the 
respective modules) currently known to the system. Unfortunately, executing 
the simulated trading protocol is a computationally expensive operation and so 
it is executed only periodically — either after a fixed number of new tasks has 
been added to the existing solution or explicitly triggered by a user request. 



4 Conclusion 

The paper presented basic concepts for holonic MAS design. At the first glance 
there is nothing special about a holon because it behaves to the outside world 
as any other agent. Most important, one can communicate with a holon by just 
sending a message to an single address. Only when we realise that holons repre- 
sent a whole group of agents does the concept get interesting. Compared with 
an object-oriented programming approach there is no comparable programming 
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construct that would support the design of such systems. We therefore have to 
further clarify from a theoretical and practical point of view how the formation 
of holons should to take place. In our experience the main reason for agents to 
form a holon is that they share intentions about how to achieve specific goals. 
The paper clarifies how the concept of shared intentions can be specified from 
a theoretical point of view and gives application scenarios in which the process 
of establishing shared intentions in holonic MAS is illustrated. The application 
scenarios differ in the sense that in the first one (flexible manufacturing) agents 
form holons because they have different abilities and can only as a group achieve 
the task at hand. In the second example (order dispatching in haulage com- 
panies) the agents forming a holon have partly overlapping abilities. The last 
example (train coupling and sharing) demonstrates that even in a setting where 
we have agents with identical abilities holonic structures can be beneficial. 
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Abstract. Randomized algorithms for deciding satisfiability were shown 
to be effective in solving problems with thousands of variables. However, 
these algorithms are not complete. That is, they provide no guarantee 
that a satisfying assignment, if one exists, will be found. Thus, when 
studying randomized algorithms, there are two important characteristics 
that need to be considered: the running time and, even more importantly, 
the accuracy — a measure of likelihood that a satisfying assignment will 
be found, provided one exists. In fact, we argue that without a reference 
to the accuracy, the notion of the running time for randomized algo- 
rithms is not well-defined. In this paper, we introduce a formal notion of 
accuracy. We use it to define a concept of the running time. We use both 
notions to study the random walk strategy GSAT algorithm. We inves- 
tigate the dependence of accuracy on properties of input formulas such 
as clause-to-variable ratio and the number of satisfying assignments. We 
demonstrate that the running time of GSAT grows exponentially in the 
number of variables of the input formula for randomly generated 3-CNF 
formulas and for the formulas encoding 3- and 4-colorability of graphs. 



1 Introduction 

The problem of deciding satisfiability of a boolean formula is extensively stud- 
ied in computer science. It appears prominently, as a prototypical NP-complete 
problem, in the investigations of computational complexity classes. It is studied 
by the automated theorem proving community. It is also of substantial interest 
to the AI community due to its applications in several areas including knowledge 
representation, diagnosis and planning. 

Deciding satisfiability of a boolean formula is an NP-complete problem. Thus, 
it is unlikely that sound and complete algorithms running in polynomial time 
exist. However, recent years brought several significant advances. First, fast (al- 
though, clearly, still exponential in the worst case) implementations of the cel- 
ebrated Davis-Putnam procedure [DP60] were found. These implementations 
are able to determine in a matter of seconds the satisfiability of critically con- 
strained CNF formulas with 300 variables and thousands of clauses [DABC96]. 
Second, several fast randomized algorithms were proposed and thoroughly stud- 
ied [SLM92,SKC96,SK93,MSG97,Spe96]. These algorithms randomly generate 
valuations and then apply some local improvement method in an attempt to 
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reach a satisfying assignment. They are often very fast but they provide no guar- 
antee that, given a satisfiable formula, a satisfying assignment will be found. That 
is, randomized algorithms, while often fast, are not complete. Still, they were 
shown to be quite effective and solved several practical large-scale satisfiability 
problems [KS92]. 

One of the most extensively studied randomized algorithms recently is GSAT 
[SLM92]. GSAT was shown to outperform the Davis-Putnam procedure on ran- 
domly generated 3-CNF formulas from the crossover region [SLM92]. However, 
GSAT’s performance on structured formulas (encoding coloring and planning 
problems) was poorer [SKC96,SK93,SKC94]. The basic GSAT algorithm would 
often become trapped within local minima and never reach a solution. To rem- 
edy this, several strategies for escaping from local minima were added to GSAT 
yielding its variants: GSAT with averaging, GSAT with clause weighting, GSAT 
with random walk strategy (RWS-GSAT), among others [SK93,SKC94]. GSAT 
with random walk strategy was shown to perform especially well. These studies, 
while conducted on a wide range of classes of formulas rarely address a critical 
issue of the likelihood that GSAT will find a satisfying assignment, if one exists, 
and the running time is studied without a reference to this likelihood. Notable 
exceptions are [Spe96], where RWS-GSAT is compared with a simulated anneal- 
ing algorithm SASAT, and [MSG97], where RSW-GSAT is compared to a tabu 
search method. 

In this paper, we propose a systematic approach for studying the quality of 
randomized algorithms. To this end, we introduce the concepts of the accuracy 
and of the running time relative to the accuracy. The accuracy measures how 
likely it is that a randomized algorithm finds a satisfying assignment, assuming 
that the input formula is satisfiable. It is clear that the accuracy of GSAT (and 
any other similar randomized algorithm) grows as a function of time — the longer 
we let the algorithm run, the better the chance that it will find a satisfying 
valuation (if one exists). In this paper, we present experimental results that 
allow us to quantify this intuition and get insights into the rate of growth of the 
accuracy. 

The notion of the running time of a randomized algorithm has not been rig- 
orously studied. First, in most cases, a randomized algorithm has its running 
time determined by the choice of parameters that specify the number of ran- 
dom guesses, the number of random steps in a local improvement process, etc. 
Second, in practical applications, randomized algorithms are often used in an 
interactive way. The algorithm is allowed to run until it finds a solution or the 
user decides not to wait any more, stops the execution, modifies the parameters 
of the algorithm or modifies the problem, and tries again. Finally, since random- 
ized algorithms are not complete, they may make errors by not finding satisfying 
assignments when such assignments exist. Algorithms that are faster may be less 
accurate and the trade-off must be taken into consideration [Spe96] . 

It all points to the problems that arise when attempting to systematically 
study the running times of randomized algorithms and extrapolate their asymp- 
totic behavior. In this paper, we define the concept of a running time relative 
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to the accuracy. The relative running time is, intuitively, the time needed by a 
randomized algorithm to guarantee a postulated accuracy. We show in the paper 
that the relative running time is a useful performance measure for randomized 
satisfiability testing algorithms. In particular, we show that the running time of 
GSAT relative to a prescribed accuracy grows exponentially with the size of the 
problem. 

Related work where the emphasis has been on fine tuning parameter settings 
[PW96,GW95] has shown somewhat different results in regard to the increase 
in time as the size of the problems grow. The growth shown by [PW96] is the 
retropective variation of maxfiips rather than the total number of flips. The 
number of variables for the 3-CNF randomized instances reported [GW95] are 
50, 70, 100. Although our results are also limited by the ability of complete al- 
gorithms to determine satisfiable instances, we have results for 50, 100, . . . ,400 
variable instances in the crossover region. The focus of our work is on maintain- 
ing accuracy as the size of the problems increase. 

Second, we study the dependence of the accuracy and the relative running 
time on the number of satisfying assignments. Intuitively, the more satisfying 
assignments the input formula has, the better the chance that a randomized 
algorithm finds one of them, and the shorter the time needed to do so. Again, 
our results quantify these intuitions. We show that the performance of GSAT 
increases exponentially with the growth in the number of satisfying assignments. 

These results have interesting implications for the problem of constructing 
sets of test cases for experimenting with satisfiability algorithms. It is now com- 
monly accepted that random A:-CNF formulas from the cross-over region are 
“difficult” from the point of view of deciding their satisfiability. Consequently, 
they are good candidates for testing satisfiability algorithms. These claims are 
based on the studies of the performance of the Davis-Putnam procedure. Indeed, 
on average, it takes the most time to decide satisfiability of CNF formulas ran- 
domly generated from the cross-over region. However, the suitability of formulas 
generated randomly from the cross-over region for the studies of the performance 
of randomized algorithms is less clear. Our results indicate that the performance 
of randomized algorithms critically depends on the number of satisfying assign- 
ments and much less on the density of the problem. Both under-constrained and 
over-constrained problems with a small number of satisfying assignments turn 
out to be hard for randomized algorithms. In the same time, Davis-Putnam pro- 
cedure, while sensitive to the density, is quite robust with respect to the number 
of satisfying truth assignments. 

On the other hand, there are classes of problems that are “easy” for Davis- 
Putnam procedure. For instance, Davis-Putnam procedure is very effective in 
finding 3-colorings of graphs from special classes such as 2-trees (see Section 4 for 
definitions). Thus, they are not appropriate benchmarks for Davis-Putnam type 
algorithms. However, a common intuition is that structured problems are “hard” 
for randomized algorithms [SKC96,SK93,SKC94]. In this paper we study this 
claim for the formulas that encode 3- and 4-coloring problem for 2-trees. We show 
that GSAT’s running time relative to a given accuracy grows exponentially with 




52 D. East and M. Truszczynski 



the size of a graph. This provides a formal evidence to the “hardness” claim for 
this class of problems and implies that, while not useful in the studies of complete 
algorithms such as Davis-Putnam method, they are excellent benchmarks for 
studying the performance of randomized algorithms. 

The main contribution of our paper is not as much a discovery of an un- 
expected behavior of randomized algorithms for testing satisfiability as it is a 
proposed methodology for studying them. Our concepts of the accuracy and the 
relative running time allow us to quantify claims that are often accepted on the 
basis of intuitive arguments but have not been formally pinpointed. 

In the paper, we apply our approach to the algorithm RWS-GSAT from 
[SK93,SKC94]. This algorithm is commonly regarded as one of the best random- 
ized algorithms for satisfiability testing to date. For our experiments we used 
walksat version 35 downloaded from ftp.research.att.com/dist/ai and run on a 
SPARC Station 20. 

2 Accuracy and Running Time 

In this section, we will formally introduce the notion of the accuracy of a random- 
ized algorithm A. We will then define the concept of the running time relative 
to accuracy. 

Let JP be a finite set of satisfiable CNF formulas and let V he a probability 
distribution defined on JP. Let ^ be a sound algorithm (randomized or not) 
to test satisfiability. By the accuracy of A (relative to the probability space 
(JP, P)), we mean the probability that A finds a satisfying assignment for a 
formula generated from T according to the distribution V. Clearly, the accuracy 
of complete algorithms (for all possible spaces of satisfiable formulas) is 1 and, 
intuitively, the higher the accuracy, the more “complete” is the algorithm for 
the space {T,V'). 

When studying and comparing randomized algorithms which are not com- 
plete, accuracy seems to be an important characteristics. It needs to be taken 
into account — in addition to the running time. Clearly, very fast algorithms 
that often return no satisfying assignments, even if they exist, are not satis- 
factory. In fact, most of the work on developing better randomized algorithms 
can be viewed as aimed at increasing the accuracy of these algorithms. Despite 
this, the accuracy is rarely explicitly mentioned and studied (exceptions are 
[Spe96,MSG97]). 

We will propose now an approach through which the running times of ran- 
domized satisfiability testing algorithms can be compared. We will restrict our 
considerations to the class of randomized algorithms designed according to the 
following general pattern. These algorithms consist of a series of tries. In each 
try, a truth assignment is randomly generated. This truth assignment is then 
subject to a series of local improvement steps aimed at, eventually, reaching a 
satisfying assignment. The maximum number of tries the algorithm will attempt 
and the length of each try are the parameters of the algorithm. They are usu- 
ally specified by the user. We will denote by MT the maximum number of tries 
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and hy MF — the maximum number of local improvement steps. Algorithms 
designed according to this pattern differ, besides possible differences in the val- 
ues MT and MF, in the specific definition of the local improvement process. 
A class of algorithms of this structure is quite wide and contains, in particular, 
the GSAT family of algorithms, as well as algorithms based on the simulated 
annealing approach. 

Let A be a randomized algorithm falling into the class described above. 
Clearly, its average running time on instances from the space {F,V) of satis- 
fiable formulas depends, to a large degree, on the particular choices for MT and 
MF. To get an objective measure of the running time, independent of MT and 
MF, when defining time, we require that a postulated accuracy be met. For- 
mally, let a, 0 < a < 1, be a real number (a postulated accuracy). Define the 
running time of A relative to aeeuraey a, F, to be the minimum time t such 
that for some positive integers MT and MF, the algorithm A with the maxi- 
mum oi MT tries and with the maximum oi MF local improvement steps per 
try satisfies: 

1. the average running time on instances from (F,V) is at most t, and 

2. the accuracy of A on (F,V) is at least a. 

Intuitively, F is the minimum expected time that guarantees accuracy a. In 
Section 3, we describe an experimental approach that can be used to estimate 
the relative running time. 

The concepts of accuracy and accuracy relative to the running time open 
a number of important (and, undoubtedly, very difficult) theoretical problems. 
However, in this paper we will focus on an experimental study of accuracy and 
relative running time for a GSAT-type algorithm. These algorithms follow a 
general pattern for the local improvement process. Given a truth assignment, 
GSAT selects a variable such that after its truth value is flipped (changed to the 
opposite one) the number of unsatisfied clauses is minimum. Then, the flip is 
actually made depending on the result of some additional (often again random) 
procedure. 

In our experiments, we used two types of data sets. Data sets of the first type 
consist of randomly generated 3-CNF formulas [MSL92]. Data sets of the second 
type consist of CNF formulas encoding the A:-colorability problem for randomly 
generated 2-trees. These two classes of data sets, as well as the results of the 
experiments, are described in detail in the next two sections. 

3 Random 3-CNF Formulas 

Consider a randomly generated 3-CNF formula F, with N variables and the ratio 
of clauses to variables equal to L. Intuitively, when L increases, the probability 
that F is satisfiable should decrease. It is indeed so [MSL92]. What is more sur- 
prising, it switches from being close to one to being close to zero very abruptly 
in a very small range from L approximately 4.25 to 4.3. The set of 3-CNF formu- 
las at the eross-over region will be denoted by CR{N). Implementations of the 
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Davis-Putnam procedure take, on average, the most time on 3-CNF formulas 
generated (according to a uniform probability distribution) from the cross-over 
regions. Thus, these formulas are commonly regarded as good test cases for ex- 
perimental studies of the performance of satisfiability algorithms [CA93,Fre96]. 

We used seven sets of satisfiable 3-CNF formulas generated from the cross- 
over regions CR{N), N = 100, 150, . . . , 400. These data sets are denoted by 
DS{N). Each data set DS{N) was obtained by generating randomly 3-CNF 
formulas with N variables and L = 4.30 (for N = 100) and L = 4.25 (for 
N > 150) clauses. For each formula, the Davis-Putnam algorithm was then used 
to decide its satisfiability. The first one thousand satisfiable formulas found in 
this way were chosen to form the data set. 

The random algorithms are often used with much larger values of N than we 
have reported in this paper. The importance of accuracy in this study required 
that we have only satisfiable formulas (otherwise, the accuracy cannot be reliably 
estimated). This limited the size of randomly generated 3-CNF formulas used 
in our study since we had to use a complete satisfiability testing procedure to 
discard those randomly generated formulas that were not satisfiable. In Section 
5, we discuss ways in which hard test cases for randomized algorithms can be 
generated that are not subject to the size limitation. 

For each data set DS{N), we determined values for MF, say MFi , . . . , MF^ 
and MTi , . . . , MT„ for use with RWS-GSAT, big enough to result in accuracy 
of least 0.98. For instance, for N = 100, MF ranged from 100 to 1000, with the 
increment of 100, and MT ranged from 5 to 50, with the increment of 5. Next, 
for each combination oi MF and MT, we ran RWS-GSAT on all formulas in 
DS{N) and tabulated both the running time and the percentage of problems 
for which the satisfying assignment was found (this quantity was used as an 
estimate of the accuracy). These estimates and average running times for the 
data set D5(100) are shown in Tables 1 and 2. 



Table 1. Running time (seconds) 



MT 






RWS-GSAT N=100 L= 


=4.3 






50 


0.07 


0.07 


0.06 


0.05 


0.04 


0.05 


0.05 


0.06 


0.05 


0.05 


45 


0.06 


0.06 


0.05 


0.05 


0.04 


0.04 


0.04 


0.05 


0.05 


0.04 


40 


0.05 


0.05 


0.05 


0.04 


0.04 


0.04 


0.05 


0.03 


0.04 


0.03 


35 


0.05 


0.05 


0.04 


0.05 


0.05 


0.04 


0.04 


0.05 


0.05 


0.04 


30 


0.04 


0.04 


0.04 


0.04 


0.05 


0.04 


0.04 


0.04 


0.04 


0.04 


25 


0.03 


0.04 


0.04 


0.04 


0.04 


0.04 


0.04 


0.04 


0.04 


0.04 


20 


0.03 


0.03 


0.03 


0.04 


0.04 


0.04 


0.04 


0.03 


0.04 


0.03 


15 


0.02 


0.03 


0.03 


0.03 


0.04 


0.03 


0.04 


0.04 


0.03 


0.04 


10 


0.02 


0.02 


0.02 


0.03 


0.03 


0.03 


0.03 


0.03 


0.03 


0.03 


5 


0.01 


0.01 


0.01 


0.02 


0.02 


0.02 


0.02 


0.02 


0.02 


0.02 


MF 


100 


200 


300 


400 


500 


600 


700 


800 


900 


1000 
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Table 2. Accuracy 



MT 


RWS-GSAT N=100 L=4.3 


50 


26 


72 


84 


98 


99 


97 


98 


97 


99 


100 


45 


26 


70 


87 


90 


97 


96 


99 


99 


98 


100 


40 


23 


71 


81 


94 


98 


98 


98 
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Fixing a required accuracy, say at a level of a, we then looked for the best time 
which resulted in this (or higher) accuracy. We used this time as an experimental 
estimate for For instance, there are 12 entries in the accuracy table with 
accuracy 0.99 or more. The lowest value from the corresponding entries in the 
running time table is 0.03 sec. and it is used as an estimate for 

The relative running times for RWS-GSAT run on the data sets DS{N), 
N = 100, 150, . . . , 400, and for a = 0.90, a = 0.95 and a = 0.50, are shown 
in Fig. la. Figure lb is the average number of flips for the same data sets. The 
average number of flips provides a machine independent comparison. Both graphs 
demonstrate exponential growth, with the running time increasing by the factor 
of 1.5 - 2 for every 50 additional variables in the input problems. Thus, while 
GSAT outperforms Davis-Putnam procedure for instances generated from the 
critical regions, if we prescribe the accuracy, it is still exponential and, thus, will 
quickly reach the limits of its applicability. We did not extend our results beyond 
formulas with up to 400 variables due to the limitations of the Davis-Putnam 
procedure, (or any other complete method to test satisfiability). For problems of 
this size, GSAT is still extremely effective (takes only about 2.5 seconds). Data 
sets used in Section 5 do not have this limitation (we know all formulas in these 
sets are satisfiable and there is no need to refer to complete satisfiability testing 
programs) . The results presented there also illustrate the exponential growth of 
the relative running time and are consistent with those discussed here. 



4 Number of Satisfying Assignments 

It seems intuitive that accuracy and running time would be dependent on the 
number of possible satisfying assignments. Studies using randomly generated 
3-CNF formulas [CFG+96] and 3-CNF formulas generated randomly with pa- 
rameters allowing the user to control the number of satisfiable solutions for each 
instance [CI95] show this correlation. 
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Fig. 1. a) Running time of RWS-GSAT on randomly generated 3-CNF formulas, plot- 
ted on a logarithmic scale as a function of the number of variables, b) Average number 
of flips for RWS-GSAT on randomly generated 3-CNF formulas, plotted on a logarith- 
mic scale as a function of the number of variables. (Machine independent comparison) 



In the same way as for the data sets DS{N), we constructed data sets 
DS {100, pk-i,Pk), where po = 1, and pk = 2*“^ * 100, k = 1,...,10. Each 
data set DS {100, pk~i,Pk) consists of 100 satisfiable 3-CNF formulas generated 
from the cross-over region C'i?(100) and having more than Pk-i and no more 
than Pk satisfying assignments. Each data set was formed by randomly generat- 
ing 3-CNF formulas from the cross-over region C'i?(100) and by selecting the first 
100 formulas with the number of satisfying assignments falling in the prescribed 
range (again, we used the Davis-Putnam procedure). 

For each data set we ran the RWS-GSAT algorithm with MF = 500 and 
MT = 50 thus, allowing the same upper limits for the number of random steps 
for all data sets (these values resulted in the accuracy of .99 in our experiments 
with the data set T>5(100) discussed earlier). Figure 2 summarizes our findings. 
It shows that there is a strong relationship between accuracy and the number 
of possible satisfying assignments. Generally, instances with small number of 
solutions are much harder for RWS-GSAT than those with large numbers of so- 
lutions. Moreover, this observation is not affected by how constrained the input 
formulas are. We observed the same general behavior when we repeated the ex- 
periment for data sets of 3-CNF formulas generated from the under-constrained 
region (100 variables, 410 clauses) and over-constrained region (100 variables, 450 
clauses), with under-constrained instances with few solutions being the hardest. 

These results indicate that, when generating data sets for experimental stud- 
ies of randomized algorithms, it is more important to ensure that they have few 
solutions rather than that they come from the critically constrained region. 

5 CNF Formulas Encoding fc-colorability 

To expand the scope of applicability of our results and argue their robustness, 
we also used in our study data sets consisting of CNF formulas encoding the k- 
colorability problem for graphs. While often easy for Davis-Putnam procedure. 
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Number of solut ions x 100 



Fig. 2. Accuracy of RWS-GSAT as a function of the number of satisfying assignments 



formulas of this type are believed to be “hard” for randomized algorithms and 
were used in the past in the experimental studies of their performance. In partic- 
ular, it was reported in [SK93] that RWS-GSAT does not perform well on such 
inputs (see also [JAMS91]). 

Given a graph G with the vertex set V = and the edge set 

E = {ei, . . . ,em}, we construct the CNF formula COL{G,k) as follows. First, 
we introduce new propositional variables col(v,i), v G V and i = 1,. . . ,k. The 
variable col{v,i) expresses the fact that the vertex v is colored with the color i. 
Now, we define COL{G, k) to consist of the following clauses: 

1. -<col(x,i) V -<col(y,i), for every edge {x,y} from G, 

2. col(x, 1) V ... V col(x, k), for every vertex x of G, 

3. ^col{x, i)\/-<col(x, j), for every vertex xofG and for every *, j, 1 < * < j < k. 

It is easy to see that there is a one-to-one correspondence between fc-colorings 
of G and satisfying assignments for COL{k, G). To generate formulas for experi- 
menting with RWS-GSAT (and other satisfiability testing procedures) it is, then, 
enough to generate graphs G and produce formulas COL{G, k). 

In our experiments, we used formulas that encode 3-colorings for graphs 
known as 2-trees. The class of 2-trees is defined inductively as follows: 

1. A complete graph on three vertices (a “triangle”) is a 2-tree 

2. If T is a 2-tree than a graph obtained by selecting an edge {x, y} in T, adding 

to T a new vertex 2 : and joining z to x and y is also a 2-tree. 

A 2-tree with 6 vertices is shown in Fig. 3. The vertices of the original triangle 
are labeled 1, 2 and 3. The remaining vertices are labeled according to the order 
they were added. 
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Fig. 3. An example 2-tree with 6 vertices 

The concept of 2-trees can be generalized to A:-trees, for an arbitrary k > 
2. Graphs in these classes are important. They have bounded tree-width and, 
consequently, many NP-complete problems can be solved for them in polynomial 
time [AP89]. 

We can generate 2-trees randomly by simulating the definition given above 
and by selecting an edge for “expansion” randomly in the current 2-tree T. We 
generated in this way families G{p), for p = 50, 60, ... , 150, each consisting of 
one hundred randomly generated 2-trees with p vertices. Then, we created sets 
of CNF formulas C{p, 3) = {COL{T, 3): T e G{p)}, for p = 50, 60, . . . , 150. Each 
formula in a set C{p,3) has exactly 6 satisfying assignments (since each 2-tree 
has exactly 6 different 3-colorings). Thus, they are appropriate for testing the 
accuracy of RWS-GSAT. 

Using CNF formulas of this type has an important benefit. Data sets can 
be prepared without the need to use complete (but very inefficient for large in- 
puts) satisfiability testing procedures. By appropriately choosing the underlying 
graphs, we can guarantee the satisfiability of the resulting formulas and, often, 
we also have some control over the number of solutions (for instance, in the case 
of 3-colorability of 2-trees there are exactly 6 solutions) . 

We used the same methodology as the one described in the previous section 
to tabulate the accuracy and the running time of RSW-GSAT for a large range 
of choices for the parameters MF and MT. F for a = 0.95, for each of the data 
sets. The results that present the running time F as a function of the number 
of vertices in a graph (which is of the same order as the number of variables 
in the corresponding CNF formula) are gathered in Fig. 4a. They show that 
RWS-GSAT ’s performance deteriorates exponentially (time grows by the factor 
of 3 — 4 for every 50 additional vertices) . 

An important question is: how to approach constraint satisfaction problems 
if they seem to be beyond the scope of applicability of randomized algorithms? 
A common approach is to relax some constraints. It often works because the 
resulting constraint sets (theories) are “easier” to satisfy (admit more satisfying 
assignments) . We have already discussed the issue of the number of solutions in 
the previous section. Now, we will illustrate the effect of increasing the number 
of solutions (relaxing the constraints) in the case of the colorability problem. 
To this end, we will consider formulas from the spaces C(p, 4), representing 4- 
colorability of 2-trees. These formulas have exponentially many satisfying truth 
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assignments (a 2-tree with p vertices has exactly 3 x 2^^ 4-colorings). For these 
formulas we also tabulated the times for a = 0.95, as a function of the number 
of vertices in the graph. The results are shown in Fig. 4b. 




Fig. 4. Running time of RWS-GSAT. a) Formulas encoding 3-colorability, plotted on 
a logarithmic scale as a function of the number of vertices, b) Formulas encoding 4- 
colorability, plotted on a logarithmic scale as a function of the number of vertices. 



Thus, despite the fact the size of a formula from C(p, 4) is larger than the size 
of a formula from C(p, 3) by the factor of 1.6, RWS-GSAT’s running times are 
much lower. In particular, within .5 seconds RWS-GSAT can find a 4-coloring of 
randomly generated 2-trees with 500 vertices. As demonstrated by Fig. 4, RWS- 
GSAT would require thousands of seconds for 2-trees of this size to guarantee the 
same accuracy when finding 3-colorings. Thus, even a rather modest relaxation 
of constraints can increase the number of satisfying assignments substantially 
enough to lead to noticeable speed-ups. On the other hand, even though “easier” , 
the theories encoding the 4-colorability problem for 2-trees still are hard to solve 
by GSAT as the rate of growth of the relative running time is exponential (Fig. 
4b). 

The results of this section further confirm and provide quantitative insights 
into our earlier claims about the exponential behavior of the relative running 
time for GSAT and on the dependence of the relative running time on the num- 
ber of solutions. However, they also point out that by selecting a class of graphs 
(we selected the class of 2-trees here but there are, clearly, many other possibili- 
ties) and a graph problem (we focused on colorability but there are many other 
problems such as hamiltonicity, existence of vertex covers, cliques, etc.) then 
encoding these problems for graphs from the selected class yields a family of 
formulas that can be used in testing satisfiability algorithms. The main benefit 
of the approach is that by selecting a suitable class of graphs, we can guarantee 
satisfiability of the resulting formulas and can control the number of solutions, 
thus eliminating the need to resort to complete satisfiability procedures when 
preparing the test cases. We intend to further pursue this direction. 
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6 Conclusions 

In the paper we formally stated the definitions of the accuracy of a randomized 
algorithm and of its running time relative to a prescribed accuracy. We showed 
that these notions enable objective studies and comparisons of the performance 
and quality of randomized algorithms. We applied our approach to study the 
RSW-GSAT algorithm. We showed that, given a prescribed accuracy, the run- 
ning time of RWS-GSAT was exponential in the number of variables for several 
classes of randomly generated CNF formulas. We also showed that the accuracy 
(and, consequently, the running time relative to the accuracy) strongly depended 
on the number of satisfying assignments: the bigger this number, the easier was 
the problem for RWS-GSAT. This observation is independent of the “density” 
of the input formula. The results suggest that satisfiable CNF formulas with few 
satisfying assignments are hard for RWS-GSAT and should be used for com- 
parisons and benchmarking. One such class of formulas, CNF encodings of the 
3-colorability problem for 2-trees was described in the paper and used in our 
study of RWS-GSAT. 

Exponential behavior of RWS-GSAT points to the limitations of randomized 
algorithms. However, our results indicating that input formulas with more solu- 
tions are “easier” for RWS-GSAT to deal with, explain RWS-GSAT’s success in 
solving some large practical problems. They can be made “easy” for RWS-GSAT 
by relaxing some of the constraints. 
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Abstract. This paper studies the practical impact of the branching 
heuristics used in Propositional Satisfiability (SAT) algorithms, when ap- 
plied to solving real-world instances of SAT. In addition, different SAT 
algorithms are experimentally evaluated. The main conclusion of this 
study is that even though branching heuristics are crucial for solving 
SAT, other aspects of the organization of SAT algorithms are also essen- 
tial. Moreover, we provide empirical evidence that for practical instances 
of SAT, the search pruning techniques included in the most competitive 
SAT algorithms may be of more fundamental significance than branching 
heuristics. 



Keywords: Propositional Satisfiability, Backtrack Search, 
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1 Introduction 

Propositional Satisfiability (SAT) is a core problem in Artificial Intelligence, 
as well as in many other areas of Computer Science and Engineering. Recent 
years have seen dramatic improvements in the real world performance of SAT 
algorithms. On one hand, local search algorithms have been used for solving large 
random instances of SAT and some classes of practical instances of SAT [21, 20, 
15,7]. On the other hand, systematic backtrack search algorithms, based on 
new and effective search pruning techniques, have been used for solving large 
structured real-world instances of SAT, a significant fraction of which requires 
proving unsatisfiability. Among the many existing backtrack search algorithms, 
rel_sat [2], GRASP [18] and SATO [24] have been shown, on a large number 
of real-world instances of SAT, to be among the most competitive backtrack 
search SAT algorithms. There are of course other backtrack search SAT algo- 
rithms, which are competitive for specific classes of instances of SAT. Examples 
include satz [17], POSIT [9], NTAB [5], 2cl [11] and CSAT [8], among others. 
It is interesting to note that the most competitive backtrack search SAT algo- 
rithms share a few common properties, which have empirically been shown to 
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be particularly useful for solving hard real-world instances of SAT. Relevant ex- 
amples of these techniques are non-chronological backtracking search strategies 
and clause {nogood) identification and recording [3, 10, 22]. 

One key aspect of backtrack search SAT algorithms is how assignments are 
selected at each step of the algorithm, i.e. the branching heuristics. Over the 
years many branching heuristics have been proposed by different authors [5,9, 
13,17]. In this paper we propose to study several of the branching heuristics 
that have been shown to be more effective in practice. For this purpose we ap- 
ply different backtrack search SAT algorithms and different branching heuristics 
to different classes of real-world practical applications of SAT. One interesting 
result of this study is that even though branching heuristics are indeed of im- 
portance in solving SAT, other aspects of the organization of backtrack search 
algorithms turn out to be of far more significance when the objective is to reduce 
the amount of search and the running time. This empirical result motivates the 
development of new search pruning techniques, in particular when the objective 
is to solve large, structured and hard instances of SAT. 

The paper is organized as follows. First, Section 2 introduces the notational 
framework used in the remainder of the paper. Afterwards, in Section 3 current 
state-of-the-art backtrack search SAT algorithms are briefly reviewed. The next 
step is to describe the different branching heuristics evaluated in this paper. 
Section 5 provides and analyzes experimental results on instances of SAT from 
different application domains. Finally, Section 6 concludes the paper. 



2 Definitions 

This section introduces the notational framework used throughout the paper. 
Propositional variables are denoted Xi,. . . ,Xn, and can be assigned truth values 
false (also, F or 0) or true (also, T or 1). The truth value assigned to a variable 
X is denoted by n{x). A literal I is either a variable Xi or its negation ^Xi. A 
clause w is a disjunction of literals and a CNF formula is a conjunction of 
clauses. A clause is said to be satisfied if at least one of its literals assumes value 
1, unsatisfied if all of its literals assume value 0, unit if all but one literal assume 
value 0, and unresolved otherwise. Literals with no assigned truth value are said 
to be free literals. A formula is said to be satisfied if all its clauses are satisfied, 
and is unsatisfied if at least one clause is unsatisfied. The SAT problem is to 
decide whether there exists a truth assignment to the variables such that the 
formula becomes satisfied. 

It will often be simpler to refer to clauses as sets of literals, and to the CNF 
formula as a set of clauses. Hence, the notation I G ix indicates that a literal I is 
one of the literals of clause u), whereas the notation u) G ip indicates that clause 
uj is one of the clauses of CNF formula p. 

In the following sections we shall address backtrack search algorithms for 
SAT. Most if not all backtrack search SAT algorithms apply extensively the unit 
elause rule [6]. If a clause is unit, then the sole free literal must be assigned value 
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1 for the formula to be satisfiable. The iterated application of the unit clause 
rule is often referred to as Boolean Constraint Propagation (BCP) [23]. For 
implementing some of the techniques common to some of the most competitive 
backtrack search algorithms for SAT [2, 18, 24], it is necessary to properly explain 
the truth assignments to the propositional variables that are implied by the 
clauses of the CNF formula. For example, let x = Vx he a truth assignment 
implied by applying the unit clause rule to a unit clause clause lo. Then the 
explanation for this assignment is the set of assignments associated with the 
remaining literals of co, which are assigned value 0. 

Let w = (xi V - 1 X 2 V X 3 ) be a clause of a CNF formula ip, and assume the 
truth assignments {xi = 0, 0:3 = 0}. Then, for the clause to be satisfied we must 
necessarily have X 2 = 0. We say that the implied assignment a ;2 = 0 has the 
explanation {xi = 0 ,X 3 — 0}. A more formal description of explanations for 
implied variable assignments in the context of SAT, as well as a description of 
mechanisms for their identification, can be found for example in [18]. 

3 Backtrack Search SAT Algorithms 

The overall organization of a generic backtrack search SAT algorithm is 
shown in Figure 1. This SAT algorithm captures the organization of several of 
the most competitive algorithms [2,9,18,24]. The algorithm conducts a search 
through the space of the possible truth assignments to the problem instance 
variables. At each stage of the search, a truth assignment is selected with the 
Decide 0 function. (Observe that selected variable assignments are character- 
ized by having no explanation.) A decision level d is also associated with each 
selection of an assignment. Moreover, a decision level 6{x) is associated with 
each assigned variable x, that denotes the decision level at which the variable is 
assigned. 

Implied necessary assignments are identified with the Deduce () function, 
which in most cases corresponds to the BCP procedure [23]. Whenever a clause 
becomes unsatisfied, the Deduce () function returns a conflict indication which is 
then analyzed using the Diagnose () function. The diagnosis of a given conflict 
returns a backtracking decision level, which corresponds to the decision level to 
which the search process can provably backtrack to. The Erase () function clears 
implied assigned variables that result from each assignment selection. Different 
organizations of SAT algorithms can be modeled by this generic algorithm, ex- 
amples of which include POSIT [9] and NTAB [5]. 

Currently, and for solving large, structured and hard instances of SAT, all 
of the most efficient SAT algorithms implement a number of the following key 
properties: 

1. The analysis of conflicts can be used for implementing Non-Chronological 
Backtracking search strategies. Hence, assignment selections that are deemed 
irrelevant can be skipped during the search [2, 18,24]. 

2. The analysis of conflicts can also be used for identifying and recording new 
clauses that denote implicates of the Boolean function associated with the 
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SAT(d, &/?) 

{ 

if (Decide(rf) ! = DECISION) 
return SATISFIABLE; 

while (TRUE) { 

if (Deduce(d) ! = CONFLICT) { 

if (SAT(d+l,/I) == SATISFIABLE) 
return SATISFIABLE; 
else ii {(5 \ = d II d == 0) { 

Erase(d); return UNSATISFIABLE; 

} 

} 

if (Diagnose(d, /I) == CONFLICT) { 
return UNSATISFIABLE; 

} 

} 

} 

Fig. 1. Generic SAT Algorithm 

CNF formula. Clause recording plays a key role in recent SAT algorithms, 
despite in most cases large recorded clauses being eventually deleted [2, 18, 
24], 

3. Other techniques have been developed. Relevance-Based Learning [2] ex- 
tends the life-span of large recorded clauses that will eventually be deleted. 
Conflict-Induced Necessary Assignments [18] denote assignments of variables 
which are necessary for preventing a given conflict from occurring again dur- 
ing the search. 

Before running the SAT algorithm, different forms of preprocessing can be 
applied [8, 18, 17]. This in general is denoted by a Preprocess () function that 
is executed before invoking the search process. 

4 Branching Heuristics 

This section describes the branching heuristics that are experimentally eval- 
uated in Section 5. The most simple heuristic is to randomly select one of the 
yet unassigned variables, and to it assign a randomly chosen value. We shall 
refer to this heuristic as RAND. Most, if not all, of the most effective branching 
heuristics take into account the dynamic information provided by the backtrack 
search algorithm. This information can include, for example, the number of liter- 
als of each variable in unresolved clauses and the relative sizes of the unresolved 
clauses that contain literals in a given variable. 

In the following sections we describe several branching heuristics which utilize 
dynamic information provided by the backtrack search SAT algorithm. These 
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heuristics are described in the approximate chronological order in which they 
have been applied to solving SAT. We should note that additional branching 
heuristics exist and have been studied in the past. See for example [9, 12, 13] for 
detailed accounts. 

4.1 BOHM’s Heuristic 

Bohm’s heuristic is briefly described in [4], where a backtrack search al- 
gorithm using this branching heuristic was shown to be the most competitive 
algorithm (at the time), for solving randomly generated instances of SAT. 

At each step of the backtrack search algorithm, the BOHM heuristic selects 
a variable with the maximal vector {Hi{x), H 2 {x), . . . , Hn{x) in lexicographic 
order. Each Hi{x) is computed as follows: 

Hi{x) = a max{hi{x) , hi{^x)) + (3 min{hi{x), hi{~^x)) (1) 

where hi{x) is the number of unresolved clauses with i literals that contain 
literal x. Hence, each selected literal gives preference to satisfying small clauses 
(when assigned value true) or to further reducing the size of small clauses (when 
assigned value false). The values of a and (3 are chosen heuristically. In [4] the 
values suggested are a = 1 and (3 = 2. 

4.2 mom’s Heuristic 

One of the most well-known and utilized branching heuristics is the Maximum 
Occurrences on clauses of Minimum size (MOM’s) heuristic [8,9, 19,23]. 

Let f*{l) be the number of occurrences of a literal I in the smallest non- 
satisfied clauses. It is widely accepted that a good variable to select is one that 
maximizes the function, 

[nx)+r{=x)]*2^+nx)*r{=x) (2) 

Intuitively, preference is given to variables x with a large number of clauses 
in X or in =x (assuming k is chosen to be sufficiently large), and also to variables 
with a large number of clauses in both x and =x. Several variations of MOM’s 
heuristic have been proposed in the past with heuristic functions related to but 
different from (2). A detailed description of MOM’s heuristics can be found in [9]. 
We should also note that, in general, we may also be interested in taking into 
account not only the smallest clauses, but also clauses of larger sizes. 

In this paper, the implementation of MOM’s heuristic we experimented with 
has the following definition: 

— Select only variables in the clauses of smallest size, V . 

— Among the variables in V , give preference to those with the largest number 
of smallest clauses, 14- If a variable appears in many of the smallest clauses, 
then it is likely to induce other implied assignments, thus constraining the 
search. 
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— For those variables in Vc, give preference to those that appear the the largest 
number of small clauses. Again the previous intuitive justification applies. 

To the selected variable assign value true if the variable appears in more 
smallest clauses as a positive literal, and value false otherwise. 



4.3 Jeroslow-Wang Heuristics 

Two branching heuristics were proposed Jeroslow and Wang in [13], and are 
also analyzed in [1, 12]. For a given literal I, let us compute: 

J(0= ^ 2-H (3) 

The one-sided Jeroslow-Wang (JW-OS) branching heuristic selects the as- 
signment that satisfies the literal with the largest value J{1). The two-sided 
Jeroslow-Wang (JW-TS) heuristic identifies the variable x with the largest sum 
J{x) + J{^x), and assigns to x value true, if J{x) > J{^x), and value false 
otherwise. 



4.4 Literal Count Heuristics 

Besides the heuristics proposed in the previous sections, others are certainly 
possible. In this section we describe three simple branching heuristics that only 
take into account the number of literals in unresolved clauses of a given variable 
at each step of the backtrack search algorithm. 

Literal count heuristics count the number of unresolved clauses in which a 
given variable x appears as a positive literal, Cp, and as negative literal. Cat. 
These two numbers can either be considered individually or combined. When 
considered combined, i.e. Cp + Cn, we select the variable with the largest sum 
Cp + Cn, and assign to it value true, if Cp > Cn, or value false, if Cp < Cn- 
Since the Cp and Cn figures are computed during the search, we refer to this 
heuristic as dynamic largest combined sum (of literals), or DLCS. 

When the values Cp and Cn are considered separately, we select the variable 
with the largest individual value, and assign to it value true, if Cp > Cn, or 
value false, if Cp < Cn- We refer to this heuristic as dynamic largest individual 
sum (of literals), or DLIS. 

As we shall show in Section 5, branching heuristics can sometimes yield bad 
branches because they are simply too greedy. A variation of DLIS, referred to 
as RDLIS, consists in randomly selecting the value to be assigned to a given 
selected variable, instead of comparing Cp with Cn- The random selection of 
the value to assign is in general a good compromise to prevent making too many 
bad decisions for a few specific instances. Clearly, we could also use DLCS for 
implementing a RDLCS branching heuristic. 
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Table 1. The classes of instances analyzed 



Benchmark Class 


^ Instances 


aim-200 


24 


jnh 


50 


pret 


8 


dubois 


13 


ucsc-bf 


223 


ucsc-ssa 


102 



5 Experimental Results 

In this section we compare different SAT algorithms and different branching 
heuristics. Where possible, we have concentrated on analyzing practical instances 
of SAT, i.e. instances derived from real-world applications. 

The classes of instances used for the experimental evaluation are shown in 
Table 1. All classes of instances are obtained from the DIM ACS suite and from 
the UCSC suite [14]. The number of instances for each class is also shown in the 
table. Of these classes of instances, &/and ssa represent practical applications of 
SAT models to Design Automation [16]. The others were proposed by different 
authors for the 1993 DIMACS Satisfiability Challenge [14]. 

For the instances considered, we ran rel^at, GRASP, SATO, POSIT and 
NTAB. While reLsat, GRASP and SATO implement the search pruning tech- 
niques described in Section 3, POSIT and NTAB are mainly fast implementa- 
tions of the Davis-Putnam procedure, with different branching heuristics imple- 
mented. As will be suggested by the experimental results, fast implementations 
of the Davis-Putnam procedure are in general inadequate for solving real-world 
instances of SAT. This conclusion has actually been reached by other researchers 
in the past [2, 18, 24]. On the other hand, reLsat, GRASP and SATO implement 
a similar set of search pruning techniques which are shown to be very effec- 
tive. After this experiment, we concentrate on evaluating GRASP when run 
with different branching heuristics, in particular those described in Section 4. It 
should be mentioned that either reLsat or SATO could be used for this purpose, 
but these algorithms only provide a single branching heuristic, whereas GRASP 
incorporates a significant number of the branching heuristics proposed in the 
literature in recent years. 

For the experimental results given below, the GPU times were obtained on 
a Pentium-II 350MHz Linux machine, with 128 MByte of RAM. In all cases 
the maximum GPU time that each algorithm was allowed to spend on any given 
instance was 500 seconds. The SAT algorithms POSIT and NTAB were run with 
the default options. reLsat was run with learning degree of 3, whereas GRASP 
and SATO were run allowing recorded clauses of size no greater than 20 to 
be recorded. Furthemore, GRASP implemented relevance-based learning with 
degree 5 [2] and was run with the RDLIS branching heuristic. The additional 
options of rel_sat, GRASP and SATO were set to their default values. 
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Table 2. CPU times for different SAT algorithms 



Class 


reLsat 


sato 


grasp 


posit 


ntab 


aim-200 


0.20 


0.51 


3.45 


0.14 


6787.03 


bf 


1.08 


1.33 


2.59 


6.64 


1585.26 


dubois 


0.09 


0.57 


17.69 


364.68 


5317.37 


iil6 


135.34 


2.25 


120.36 


26.67 


1022.32 


ii32 


399.32 


4.95 


580.65 


2.89 


10.7C 


jnh 


0.62 


0.93 


8.86 


0.18 


14.90 


pret 


1.07 


1.46 


3.30 


173.26 


2460.31 


ssa 


17.02 


1.71 


2.87 


15.90 


1006.46 


ucsc-bf 


194.22 


68.79 


115.14 


2642.99 


89616.31 


ucsc-ssa 


149.12 


24.23 


32.32 


518.31 


6519.14 



Table 3. Number of aborted instances for different SAT algorithms 



Class 


reLsat 


sato 


grasp 


posit 


ntab 


aim-200 


0 


0 


0 


12 


13 


bf 


0 


0 


0 


2 


3 


dubois 


0 


0 


0 


9 


9 


iil6 


0 


0 


0 


1 


2 


ii32 


0 


0 


0 


0 


0 


jnh 


0 


0 


0 


0 


0 


pret 


0 


0 


0 


4 


4 


ssa 


0 


0 


0 


0 


2 


ucsc-bf 


0 


0 


0 


100 


174 


ucsc-ssa 


0 


0 


0 


2 


12 


Total 


0 


0 


0 


130 


219 



5.1 Results for Different SAT Algorithms 

The experimental results for reLsat, GRASP, SATO, POSIT and NTAB, 
on selected classes from the DIMACS and the UCSC instances, are shown in 
Tables 2, 3 and 4. Table 2 includes the CPU times for each class of instances. 
Table 3 indicates the number of instances each algorithm was unable to solve in 
the allowed CPU time. Finally, Table 4 indicates the total number of decisions 
made by each algorithm for each class of instances. This figure provides an idea 
of the amount of search actually conducted by the different algorithms. For 
instances aborted by any given algorithm, the number of decisions accounted 
for is 0. For example, for POSIT and for class aim-200, the number of decisions 
shown are solely for the instances POSIT was able to solve (i.e. 12), which 
represent 50% of all instances in class aim-200. 

It should be emphasized that the results for the UCSC benchmarks are partic- 
ularly significant, since they result from actual practical applications of SAT [16]. 

From the above results several conclusions can be drawn: 
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Table 4. Number of branches for different SAT algorithms 



Class 


reLsat 


sato 


grasp 


posit 


ntab 


aim-200 


1074 


1597 


4098 


580 


937782 




992 


1708 


1083 


14914 


48143 


dubois 


883 


1854 


18399 


9662919 




iil6 


36841 


3993 


6644 




3946 


il32 


84123 


8321 


11566 


415 






1135 


1284 


3651 


280 


482 




4584 


6012 


15241 


4187100 


419430C 


ssa 


16960 


2904 


1780 


56519 


3180 


ucsc-bf 


147540 


108469 


43890 






ucsc-ssa 


138997 


44001 


19018 


1428196 


367671 



— Backtrack search SAT algorithms, based on plain implementations of the 
Davis-Putnam procedure, are clearly inadequate for solving a signihcant 
fraction of the instances studied. 

— The search pruning techniques included in more recent algorithms, e.g. reLsat, 
SATO and GRASP, are clearly effective in most classes of instances, and for 
some classes of instances they are essential. 

From these results, and since reLsat, SATO and GRASP use different branch- 
ing heuristics, one might be tempted to extrapolate that the branching heuristic 
used is irrelevant when effective search pruning techniques are implemented by 
SAT algorithms. In general, this is is not the case, as we shall see in the following 
sections. Nevertheless, we will be able to provide evidence that in most cases, 
and for practical instances of SAT, the set of search pruning techniques used by 
a backtrack search SAT algorithm plays a more significant role than the actual 
branching heuristic used. 

5.2 Results for Different Branching Heuristics 

In this section we use GRASP for evaluating the effect of branching heuristics 
in recent SAT algorithms. (GRASP was selected because it is the only algorithm 
that implements several branching heuristics proposed by different authors [13, 
9,4].) 

The experimental results for GRASP, on the same classes of instances of 
the previous section, and for the branching heuristics described in Section 4, 
are shown in Tables 5, 6, 7, 8 and 9, which respectively present the number 
of aborted instances, the GPU times, the total number of decisions, the total 
number of backtrack steps taken, and the percentage of backtrack steps that 
were taken non-chronologically. 

From the experimental results, the following conclusions can be drawn: 

— GRASP obtains similar results with most branching heuristics. A more de- 
tailed analysis of the experimental data actually reveals substantial differ- 
ences only for a few instances of the few hundred instances evaluated. 
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Table 5. Number of aborted instances for GRASP with different branching heuristics 



Class 


BOHM 


DLCS 


DLIS 


JW-OS 


JW-TS 


MOM 


RAND 


RDLIS 


aim-200 


0 


0 


0 


0 


0 


0 


0 


0 


bf 


0 


0 


0 


0 


0 


0 


0 


0 


dubois 


0 


0 


0 


0 


0 


0 


1 


0 


iil6 


2 


2 


1 


1 


1 


1 


0 


0 


ii32 


1 


1 


0 


1 


1 


1 


1 


0 


jnh 


0 


0 


0 


0 


0 


0 


0 


0 


pret 


0 


0 


0 


0 


0 


0 


4 


0 


ssa 


1 


0 


0 


0 


0 


1 


0 


0 


ucsc-bf 


2 


0 


0 


0 


0 


2 


0 


0 


ucsc-ssa 


6 


0 


0 


0 


0 


6 


0 


0 


Total 


12 


3 


1 


2 


2 


11 


6 


0 



Table 6. GRASP GPU times with different branching heuristics 



Class 


BOHM 


DLCS 


DLIS 


JW-OS 


JW-TS 


MOM 


RAND 


RDLIS 


aim-200 


1.03 


3.20 


3.42 


1.60 


1.39 




1 


3.1f 


bf 


5.98 


2.25 


2.25 


3.95 


3.31 


mm 


59.73 


2.65 


dubois 


0.35 


12.65 


11.97 


31.87 


K&KIb 


0.36 




17.4f 


iil6 


2043.32 


970.78 


431.16 


696.94 




1178.21 


13.2] 


120. If 


ii32 


2179.03 


858.19 


2.01 


345.39 


334.21 


1250.0^ 


204.7^ 


574.3: 


jnh 


2.10 


3.99 


6.02 


3.14 


2.42 




71.01 


8.75 


pret 


1.88 


3.58 


3.58 


3.89 


3.87 


1.88 


809.0C 


3.34 


ssa 


167.86 


4.07 


2.56 


4.62 


7.29 


168.35 


2.47 


2.87 


ucsc-bf 


550.43 


100.57 


83.54 


88.08 


93.17 


526.0f 


159.3E 


115.2^ 


ucsc-ssa 


1195.58 


34.54 


24.87 


47.21 


225.21 


1196.4^ 


65.27 





— For a few classes of instances, the branching heuristics most often used by 
SAT algorithms (i.e. BOHM and MOM) end up yielding worse results. Our 
interpretation is that these heuristics are simply too greedy, and can for some 
instances make too many bad branches. 

— The plain, straightforward, randomized branching heuristic, RAND, com- 
pares favorably with the other heuristics, and actually performs better (in 
GRASP and for the classes of instances considered) than BOHM’s or MOM’s 
heuristics. 

— Randomization can actually be a powerful branching mechanism. For ex- 
ample, while DLIS aborts one instance, RDLIS, by not being so greedy as 
DLIS, aborts none. In general the run times for RDLIS are slightly larger 
than those for DLIS, but RDLIS is less likely to make bad branches, that 
can cause a SAT algorithm to eventually quit on a given instance. 

Another interesting result is the percentage of non-chronological backtrack 
steps (see Table 9). In general the percentages are similar for different heuristics, 
but differences do exist. Qualitatively, the percentages tend to be similar for three 
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Table 7. Number of decisions for GRASP with different branching heuristics 



Glass 


BOHM 


DLCS 


DLIS 


JW-OS 


JW-TS 


MOM 


RAND 


RDLIS 


aim-200 


1312 


4009 


4629 


2480 


■Egg 


1199 


5867 


408C 


bf 


2086 


865 


1004 


1142 


1139 


2150 


13801 


1081 


dubois 


1288 


15117 


15662 


16481 






■■ 


18399 


iil6 


53908 


32167 


20171 


31367 


4275J 


42001 


556^ 


664^ 


ii32 


29099 


15549 


648 


19949 


17616 




5137f 


1156f 


jnh 


1181 


2422 


3198 


1967 


1454 


1150 




365] 


pret 


4904 


16584 


16645 


16312 


16346 


4892 


93437 


1524] 


ssa 


14672 


2587 


1869 


2755 


3074 


14674 


3529 


178C 


ucsc-bf 


81991 


40792 


35993 


33424 


3139] 


778 R 


10430^ 


4389( 


ucsc-ssa 


114049 


23398 


18193 


26644 


3794C 




5924f 





Table 8. Number of backtracks for GRASP with different branching heuristics 



Class 


BOHM 


DLCS 


DLIS 


JW-OS 


JW-TS 


MOM 


RAND 


RDLIS 


aim-200 


830 


1901 


1996 


1228 


1183 


744 


1844 


1934 


bf 


1178 


404 


422 


641 


725 


1219 


6360 


472 


dubois 


902 


3917 


3935 


5473 


4851 


902 


15994 


4267 


iil6 


48839 


27320 


13838 


20799 


24895 


3867^ 


109t 


456f 


ii32 


25892 


13868 


59 


11265 




22161 


26464 


8431 


jnh 


969 


1919 


2502 


1416 


1121 


951 


■■ 


2951 


pret 


1684 


1870 


1765 


1847 


1825 






1684 


ssa 


10203 


986 


739 


1139 


1410 




115£ 




ucsc-bf 


40598 


12509 


9393 


11536 


12364 


38737 


■■ 


14307 


ucsc-ssa 


61179 


5273 


4026 


6822 


16231 


6117C 


15827 





main groups of heuristics. First for BOHM and MOM, second for DLIS, DLCS, 
JW-OS, JW-TS, and RDLIS, and finally, for RAND. In general the highest 
percentage of non-chronological backtrack steps is largest in RAND, which is to 
be expected. 

6 Conclusions 

This paper analyzes different backtrack search SAT algorithms and their use 
of branching heuristics. The obtained experimental results provide evidence that, 
even though branching heuristics play an important role in solving SAT, more 
significant performance gains are in general possible, for real-world instances of 
SAT, by using techniques for pruning the amount of search. Among these, we 
have studied the effect of non-chronological backtracking and clause recording, 
among others. Further validation of the conclusions presented in this paper can 
be obtained by extending the experimental analysis to other real-world applica- 
tions of SAT, for which relevant sets of instances exist. 

Techniques for pruning the amount of search have been proposed over the 
years in many different areas. The results presented in this paper motivate ex- 
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Table 9. Percentage of non-chronological backtracks in GRASP 



Class 


BOHM 


DECS 


DLIS 


JW-OS 


JW-TS 


MOM 


RAND 


RDLIS 


aim-200 


18.19 


30.35 


33.92 


27.44 


19.61 


19.35 


51.0f 


30.51 


bf 


28.35 


28.22 


38.86 


22.62 


16.14 


28.22 


50.25 


36.8f 


dubois 


4.32 


42.53 


45.67 


23.61 


25.44 


4.32 


69.45 


47.11 


iil6 


5.41 


5.34 


7.46 


15.73 


18.69 


4.53 


66.35 


10.65 


ii32 


6.46 


5.65 


42.37 


26.18 


20.00 


6.41 


42.22 


18.65 


jnh 


4.64 


10.89 


13.47 


13.56 


7.58 


4.63 


17.94 


12.77 


pret 


36.22 


64.12 


69.18 


65.62 


65.75 


35.96 


50.0^ 


68.51 


ssa 


23.12 


34.99 


33.15 


29.68 


29.08 


23.14 


58.67 


36.1( 


ucsc-bf 


33.61 


40.06 


44.01 


35.84 


32.78 


32.87 


60.9C 


48.42 


ucsc-ssa 


33.35 


37.78 


37.98 


33.35 


30.69 


33.35 


64.12 


40.2f 



ploring the application of additional search pruning techniques to SAT, with 
the goal of allowing state-of-the-art SAT algorithms to solve an ever increasing 
number of real-world instances of SAT. 
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Abstract. We define, in this paper, for every n > 1, n-dimensional 
block algebra as a set of relations, the block relations, together with 
the fundamental operations of composition, conversion and intersection. 
We examine the 13" atomic relations of this algebra which constitute 
the exhaustive list of the permitted relations that can hold between two 
blocks whose sides are parallel to the axes of some orthogonal basis in 
the n-dimensional Euclidean space over the field of real numbers. We 
organize these atomic relations in ascending order with the intention of 
defining the concept of convexity as well as the one of preconvexity. We 
will confine ourselves to the issue of the consistency of block networks 
which consist of sets of constraints between a finite number of blocks. 
Showing that the concepts of convexity and preconvexity are preserved 
by the fundamental operations, we prove the tractability of the problem 
of the consistency of strongly preconvex block networks, on account of 
our capacity for deciding it in polynomial time by means of the path- 
consistency algorithm. 

Keywords: Spatial reasoning. Constraint satisfaction problems, Trac- 
tability. 



1 Introduction 

It is a truth universally acknowledged that representation and reasoning about 
space and time is the heart of the matter for several fields of artificial intelli- 
gence and computer science: geographical information systems, natural language 
understanding, specification and verification of programs and systems, temporal 
and spatial databases, temporal and spatial planning, etcetera. As far as the 
problem of space and time is concerned, it would be impossible to exaggerate 
the importance of interval algebra as well as the one of region algebra. On the 
one hand, the model of the intervals has been put forward by Allen |T] who 
provides a framework for representation and reasoning about temporal relations 
between intervals while extending the traditional methods for solving constraint 
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satisfaction problems. There were many who decided to discover subclasses of 
interval networks whose consistency can be decided in polynomial time. It is 
greatly to the credit of Nebel and Biirckert m that they succeeded in proving 
that reasoning about ORD-Horn relations between intervals can be done in po- 
lynomial time by means of the path-consistency algorithm. On the other hand, 
the model of the regions has been brought in by Randell, Cui and Cohn [T^ 
m who introduce a language for representation and reasoning about spatial re- 
lations between regions while adapting the usual network-based framework of 
constraint satisfaction. Recently, Renz and Nebel m characterized a tractable 
subclass of region networks the consistency of which can be decided by means 
of the path-consistency algorithm. This paper extends the concept of interval to 
the n-dimensional Euclidean space over the field of real numbers. In actual fact 
it would be more accurate to say this line of thinking has been put forward by 
Mukerjee m and Giisgen [5]. However, it would hardly be an exaggeration to 
say that they did not make the most of their idea. The formalism that we obtain 
provides a framework for representation and reasoning about spatial relations 
between blocks whose sides are parallel to the axes of some orthogonal basis in 
such a space. Adapting the line of reasoning suggested by Ligozat j^, it lays out 
an ascending order between the 13" atomic relations of the block algebra with 
a view to defining the concepts of convexity and preconvexity. We will confine 
ourselves to the issue of the consistency of block networks which consist of sets 
of constraints between a finite number of blocks. Showing that the concepts of 
convexity and preconvexity are preserved by the fundamental operations of com- 
position, inversion and intersection, we prove the tractability of the problem of 
the consistency of strongly preconvex block networks, on account of our capacity 
for deciding it in polynomial time by means of the path-consistency algorithm. 



2 The n-Block Algebra 



By way of introduction, let us set out, for every positive integer n > 1, the n- 
dimensional block algebra An- The issue at stake here is one of representation 
and reasoning about the possible relations that can hold between two blocks of 
the Euclidean space of dimension n over the field of real numbers. Therefore, we 
will confine ourselves to giving an account of the permitted connections between 
two n-blocks or to be more precise between two Cartesian products of n intervals 
of the real line. In this respect, for every n-block x and for every i S {I, . . . , n}, 
let x^i be the projection of x onto the i-th. axis. It is worth mentioning here 
that I-blocks are equivalent to intervals of the real line whereas 2-blocks are 
equivalent to rectangles whose sides are parallel to the axes of some orthogonal 
basis. 

Let us now consider the well-known interval algebra (lA) which 13 atomic rela- 
tions: 



Bint = {p, rn, o, s, d, f,pi, mi, oi, si, di, fi, eq}. 
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constitute the exhaustive list of the relations that can hold between two intervals 
of the real line, see figure [T] In the light of the links between n-blocks and their 
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Fig. 1. Atomic relations in the interval algebra 



projections 0:4,1, . . . , x^n, the set Bn of the 13 " atomic relation of An is defined 
from these 13 atomic relations between intervals in the following way: 

Bn — { (Pl ,■■■ T Pn) - Vi € 1 ,... ,?r. Pi G Bint } j 

SO two n-blocks x and y satisfy the atomic relation (pi, . . . ,pn) G if and only 
if, for every i G {1, • ■ • ,n}, the intervals Xii and y^i satisfy the atomic relation 
Pi S Bint- From all the evidence it is clear that the atomic relations of Bn are 
mutually exclusive and complete, for the simple reason that each pair x,y of 
n-blocks satisfy exactly one of these relations. 

Example 1 For example, in figure\^is represented two 3-blocks: x and y, which 
satisfy the atomic relation (p,m,o). Indeed, the interval xii precedes the interval 
j/41 on the axis 1, the interval x\,2 and the interval j/4,2 satisfy the atomic relation 
meets on the axis 2, and, 2:4,3 overlaps 2/43. 

The n-block algebra An is made of a set of 2 ^^ relations, the block relations, 
which are sets of atomic relations in Bn- It is important to be clear that, for 
every block relation R and for every n-block x, y, it is the case that x R y iS 
there exists an atomic relation (pi, . . . ,pn) S R such that x {pi, . . . ,pn) y- In 
other respects, there is a very important point here and that is that A\ coin- 
cides with the interval algebra introduced by Allen [T] while A2 is nothing but 
the rectangle algebra expounded by Balbiani, Condotta and Farinas | 3 ]. 

Let us remind the reader of the fact that, for every atomic relation p = (pi , . . . , 
Pn) G Bn and for every i G { 1 , ■ • ■ ,n}, the notation represents the atomic 
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Fig. 2. The atomic relation {p, m, o) satisfied between two 3-blocks x and y. 



relation pi of Bint- Similarly, for every block relation R G 2®", let Rii be the 
projection of R onto the z-th axis or to be more precise the interval relation 



Ru = {Pu -pG R}- 



Following the line of reasoning suggested by Ligozat [3, the atomic relations of 
Bint are arranged in the partial ascending order < which defines the interval 
lattice of figure |3] a. Adapting the definition worked out by Ligozat, the atomic 
relations of Bn are arranged in the following partial ascending order, Vp, qGBn- 

p C q iff Vz e 1, . . . , n, Pli<ql^■ 

We have {Bn, E) which defines a lattice: the n-block lattice. 

Definition 1. Adapting these concepts to the atomic relations in Bn, let us 
define the dimension dim{p) of an atomic relation p in Bn as follows: dim{p) = 
dim{pii) -h . . . -b dim{pin)- What is more, given a block relation R in 2®”, let 
dim{R) = max{dim{p) : p G i?} be the dimension of R. Similarly, let C{p) = 
C(p^i) X ... X C(p^„) be the topological closure of the atomic relation p in Bn- 
In addition, let C{R) = [J{C{p) : p G i?} be the topological closure of a block 
relation R in 2®" . 

The inescapable conclusion is that the dimension of an atomic relation p in Bn is 
equal to 2 x n, the maximal number of endpoint equalities between two n-blocks, 
minus the number of endpoint equalities imposed by p. 

Example 2 Just consider, by way of illustration, the atomic relation (p,m,o) of 
Bs, see figure\^ Seeing that it imposes 1 endpoint equalities, dim{{p,m,o)) = 5. 
To take another example, dim{{eq, s, f)) =2 by virtue of the fact that the atomic 
relation (eq,s,f) of B^ forces 4 endpoint equalities. 

It should be stressed that the concept of dimension and the one of topological 
closure are linked together as follows. Let p,q be atomic relations in Bn- It is 
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Fig. 3. (a) The interval lattice {Bint, <), (b) The topological closure of Bint- 

undoubtedly true that q belongs to the topological closure of p if and only if 
there exists a sequence ri , . . . , of atomic relations in such that ri = p, Vm 
= q and, for every i G {1, ... , to — 1}, rt is adjacent to rt^i in the n-block lattice 
whereas dim{ri) > dim{ri^i). Moreover, two of the most important features 
of interval algebra are the concepts of dimension and topological closure of an 
atomic relation in Bint- Relating to this matter, see figure[T]as well as figure |3]b. 

Example 3 Let us consider the block relation {(p, s, si), (to, s, mi)} in As which 
topological closure is equal to the block relation {{p, s, si),{m, s, si),{p, eq,si), 
(to, eq, si), {p, s, eq), (to, s, eq), {p, eq, eq), (to, eq, eq), (to, s, mi), (to, eq, mi )}. 

In order to focus attention on the existence, for every block relation R in An, of 
the least interval of the n-block lattice {Bn, E) that contains R, let us consider 
the following definition. 

Definition 1 For every block relation R in An, the least interval I{R) of the 
n-block lattice {Bn,'Q) that contains R is called the convex closure of R. 

Example 4 In As, we have !{{{p, s, si), (to, s, mi)}) = [{p, s, si), (to, s, mi)] = 
{(p, s, si), (p, s, oi), (p, s, mi), (to, s, si), (to, s, oi), (to, s, mi)}- 

It is indisputable that the functions C and I which map block relations to their 
topological and convex closures are monotonic operators. For the simple reason 
that, for every block relation R and S', if i? C S' then C{R) C C(S) and I{R) C 
/(S). We have also I{C{R)) C C{I{R)), but the inverse is not always true. In 
other respects, it is undeniable that 

Proposition 1 Let R, S G 2®"^ be block relations- Then, 

- S C X . . . X R^ni 
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— dim{R) < dim{Rii) + . . . + dim{Rin); 

— dim\l{R)) > dim{R); 

— dim\c{R)) = dim{R); 

— I{R) = X ... X 

— C{R) C C{R^i) X . . . X C{R^n). 

Before going into the issue of preconvexity, we wish first to lay out the concept of 
saturation. A relation R of 2®" is saturated iff it is equal to a Cartesian product 
of interval relations. This means that R is saturated iff i? = R^i x ... x 

Example 5 {{pi, f, o), {pi, f, m), {pi, eq, o), {pi, eg, m), {pi, m, o), {pi, m, m)} is a 
saturated relation of 2®^ which corresponds to the Cartesian product: {pi} x 
{f,eq, m] x {o,m}. 

What is more, 

Proposition 2 Let R be a saturated relation of 2®" . Then, 

dim{R) = dim{Rii) + . . . + dim{Rin) and C{R) = C{Rii) x ... x C{Rin)- 

3 Some Particular Subsets of 2®” 

In this section, we will set out the concept of convexity as well as the one of 
preconvexity. First of all, let us extend to An the concept of convexity. 

Definition 2 Let R be a relation of 2®" . R is convex iff R corresponds to an 
interval in the n-block lattice {Bn,Q), that is to say: iff L{R) = R. 



Example 6 {{eq, m,p), {eq, m, m), {eq, m, o), {s, m,p), {s, m, m), {s, m, o)} is a 
convex relation o/ 2®^ which corresponds to the interval [{s,m,p),{eq,m,o)]. 
The convex relations have a strong power of expressiveness, indeed they can 
express orientation directions. For example, the convex relation “the right side 
of the 3-block x is on the left of the left side of the 3-block y corresponds to 
[{P,P,P), {p,pi,pi)]- 

There is no doubt that a convex relation R of 2®" is equal to the Cartesian 
product of convex interval relations. In conclusion, if i? is a convex n-block 
relation then i? is a saturated n-block relation. What is more, we can also prove 
that the Cartesian product of n convex interval relations is a convex n-block 
relation. Let us now consider the following proposition which links together the 
concept of dimension and the one of convexity. 

Proposition 3 Let R be a convex relation in 2®" . Then, for every atomic rela- 
tion p G R, there exists an atomic relation q G R such that dim{R) = dim{q) 
and p G C{q). 

Proof. Firstly, we prove it for 2®“^ by examining the exhaustive list of the 
convex relations of Ai. Let i? be a convex relation of 2®". We recall that 
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R = R^i X ... X with R^i a convex relation of A\. Moreover, R is saturated, 
then C{R) = x . . . x and dim{R) = dim{Rii) + . . . + dim{Rin) 

(proposition!^. From all this, the proposition can be easily proved for n > 1. □ 

Adapting the line of reasoning suggested by Ligozat 0, we extend the concept 
of preconvexity as follows. 

Definition 3 The n-hlock relation R is weakly-preconvex if and only ifC{R) is 
a convex relation o/2®" whereas it is strongly-preconvex if and only if, for every 
convex n-block relation S, RC\ S is a weakly-preconvex relation. 

The expressions W and S will denote respectively the set of the weakly-preconvex 
relations and the set of the strongly-preconvex relations of 2®'* . Seeing that the 
total relation, which is equal to the set of the atomic relations of .S„, is a convex 
relation, it appears that 5 C W. It should not be forgotten that, in dimension 
1, the sets W and S coincide with the set of the preconvex interval relations 
studied by Ligozat I7l8l . On the other hand, if n > 1 then S is strictly included 
in W. Moreover, in the next section, we will pursue the question of whether 
every convex relation is weakly preconvex or not. 

Example 7 {(d, si, d), (s, si, eg), (s, eg, eg), {eg, si, s)} is a weakly-preconvex re- 
lation of 2®3 which is not a strongly-preconvex relation (the intersection bet- 
ween this relation and the convex relation [{s, eg, s), {eg, si, eg)] does not belong 
to W). Whereas, {{d,si,d),{s,si, s),{s,si,eg),{eg,si,s)} is a weakly-preconvex 
and strongly-preconvex relation of 2®^ . These relations are not saturated. 

Following the line of reasoning suggested by Ligozat, we must conclude that 

Proposition 4 Let R G 2®" be. The three following properties are eguivalent: 

— C{R) is a convex relation, 

— I{R) C C{R), 

— dim{I{R) \ R) < dim{I{R)) . 

As a consequence, a relation R of 2®" is a weakly-preconvex relation if and only 
if to compute its convex closure, we add it only atomic relations of dimension 
strictly less than its own dimension. From all this, it follows that: 

Proposition 5 Let R G 2®" be a n-block relation. Lf R is a weakly-preconvex 
relation then, for every atomic relation p G I{R), if dim {p) = dim{L{R)) then 
p G R and dim{p) = dim{R). 

Hence, if i? S W then, p is a maximal atomic relation in R if, and only if, p is a 
maximal atomic relation in I{R). 

4 Fundamental Operations 

Now we are in a position to introduce the fundamental operations of composition, 
intersection and inverse. Obviously, the intersection of two n-block relations R 
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and S is equal to the relation {p : p € R and p G 5}. Later in this section, 
we will demonstrate that the intersection of two convex n-block relations is a 
convex relation. With regard to composition and inverse, we have no alternative 
but to put forward the following definitions. 

Definition 4 Letp,q he atomic relations in Bn- The composition of p and q is 
equal to the n-block relation {p^i o q^^) x ... x o Similarly, let R,S 
be two n-block relations o/2®". The composition RoS is equal to the relation 
U{(p;i o q^i) X ... X o q^n) ■■ p € R and q G S}. 

where, for every atomic relation p, q between intervals, p o q is the composition 
of p and q in interval algebra. 

Definition 5 Let p be an atomic relation in Bn- The inverse of p: p'" , is equal 
to the atomic relation {pii'^, - - - jPin'")- Similarly, let R he a n-block relation of 
2®". The inverse R'" of R is equal to the relation U{(p 4 ,i"", . . . ,Pin'") ■ P S i?}. 

where, for every atomic interval relation p, p'" is the inverse of p in interval 
algebra. We are toying with the idea of proving that, for every convex n-block 
relation R, S, the relations RoS, RnS and R'" are convex. Moreover, we intend 
to demonstrate that every convex n-block relation is strongly preconvex. First, 
in the light of the previous definitions, the inescapable conclusion is that 

Proposition 6 Let R, S be relation in 2®" . Then, if R and S are saturated then: 



- R~" = Rii'" X ... X Rin'" ; 

- RoS = {Rn o Sn) X ... X (R^n o Sin); 

- R(1S= (Rii nSn)x...x {Rin n 

Seeing that, for every convex interval relation R, S, the interval relations RoS, 
RnS and R"" are convex, this only goes to show that, for every convex relation 
R, S of An, the n-block relations RoS, i? fl S' and R'" are convex too. Hence, 
the convex relations of 2®" constitute a subclass of 2®" . Let us now consider to 
what extent every convex n-block relation is weakly preconvex. Let i? be a convex 
relation of 2®”. This implies that I{R) = R- Consequently, I{R) C C{R) and, 
according to the proposition [H we must conclude that R G W. Moreover, as the 
concept of convexity is preserved by the fundamental operation of intersection, 
this, of course, leads to the logical conclusion that the convex relations of 2®" 
belong to S. Let us consider the connection between the fundamental operations 
of inverse and composition and the functions C and /. 

Proposition 7 Let R,S be two n-block relations o/2®". Then, 

(a) = L(R)'^ and C{R'^) = C{R)^ ; 

(h) iIroS) C I{R)oI{S); 

(c) C{R)oC{S) C C{RoS)- 

(a) results from the symmetry of the n-block lattice. The proof of (b) is a conse- 
quence of the fact that, I{Ro S) is the least convex relation which contains RoS 
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and I{R) o /(S') is a convex relation which contains Ro S too. (c) was proved 
for Ai in [3; from this and using proposition |5j we can prove (c) for An, with 
n > 1. 

From these results, we can assert that: 

Proposition 8 W fs stable for the operations: composition and inverse. 

The stability for the composition is given by this: for each R,Sg W, I{Ro S) 
C I(R) o /(S) C C{R) o C{S) C C{R o S). On the other hand, W of An, with 
n > 1, is not stable for the intersection. The following example proves this for 
As but it can be adapted for An, with n >2. 

Examples Let us consider two weakly-preconvex relations of 2®='.- {{d,si, 
d), (s,si,eq),{eq,si,s)} and {(o,si,o),{s,si,eq),{eq,si,s)}. The intersection of 
these relations: {{s,si,eq),{eq,si,s)} is not a weakly-preconvex relation. 

Now, let us consider the set of the strongly preconvex relations, S. It is easy to 
prove that S is stable for inverse. Indeed, let R, S £ 2®” , such that R £ S and S is 
a convex relation, S"' is convex too, consequently, RnS'" £ W. Since W is stable 
for inverse, we obtain (i? fl S'") £ W. From the fact that (i? fl S'"") = R'" fl S, 

we conclude that R'" £ S. The proof of the stability of S for the operation 
intersection is less direct, firstly we are going to prove the following proposition: 



Proposition 9 Let R, S£S,RC\S£ W. 

Proof. We are going to prove that C{R fl S) is a convex relation of 2®" by 
showing that I{C{R fl S)) = C{R fl S). Let us denote T the convex relation 
I{C{R n S)). Rn S Q T, consequently i? fl S C T (1 R and i? fl S C T fl S. It 
follows that /(C(SnS)) C /(C(Tni?)) and I{C{RnS)) C /(C(TnS)). Hence, 
we can deduce that T C I{C(T fl R)) and T C I{C(T fl S)). Moreover, let us 
recall that T is a convex relation and R, S are two strongly-convex relations, 
consequently Tr\R£WetTr\S£ W. Hence, C(T fl R) and C{T fl S) are 
two convex relations. From this, we deduce that I{C{T D R)) = C{T fl R) and 
/(C'(TnS)) = C'(rnS). Hence, T C C{TnR) and T C C(TnS). We deduce that 
C'(T) = C{T n R) and C(T) = C{T n S). Hence, dim{C{T n R)) = dim{C{T)), 
and consequently, we obtain dim(T) = dim(T fl /?). A similar line of reasoning 
on S would lead to the conclusion: dim{T) = C(T fl S). Let p £ Bn such that 
p £ T and dim{p) = dim{T). From the previous results, and proposition [T] 
we can deduce that p £ C(T fl R) and dim{p) = dim{T fl R). Consequently, p 
belongs to T fl /?, and then p belongs to R. By making the same reasoning with 
S instead of R, we conclude that p G S'. It follows that p £ i? fl S. This result 
and proposition [^implies that T C C{Rn S). 

It is obvious that C{R fl S) C I(C{R fl S)), so C{R fl S) C T. From all this, we 
conclude that C{R (1 S) = T, and consequently C{R fl S) is a convex relation. 
Hence, i? fl S belongs to >V. □ 

Now, we can prove a stronger result: 

Proposition 10 Let R, S £ S, RC\ S £ S . 
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Proof. Firstly, let us prove that R(1 S G S, for each relation R,S& 2®" , such 
that R G S and S' is a convex relation. Let S, T be two convex relations of 2®". 
We have (i?nS)nT = i?n(SnT). Snr is a convex relation too, and since R G S, 
we conclude that i?n (SflT) G W. Consequently, (i?n S) flT belongs to W and 
so i? n S G S. Now, let us prove the main result. Let us suppose that S G S. 
We have (i? fl S) fl T = ii n (S fl T). From the first result of this proof we have 
Snr G S, consequently from proposition [9l we deduce that i? n (S n T) G W. 
We conclude that (i? n S) n T G W, so i? n S G S. □ 

In section E] we will see that the stability of S with the convex relations for 
the intersection is a very nice property which W does not have. 

5 Constraint Networks 

Information between n-blocks are represented by n-block constraint networks 
which are particular constraint satisfaction problems. They are defined in the 
following way: 

Definition 6 A n-block constraint network is a structure (V,C), where V = 
{Vi,... ,Vn} is a set of variables which represents n-blocks, and C is a map- 
ping from V X V to the set of the relations in 2®" which represents the binary 
constraints between the n-blocks (we denote by Cij the constraint between the 
variables Vi and Vj). C is such that : 

- for every i G {I, . . . , |k|}, Cu = {(eg, . . . , eg)}; 

- for every i,j G {I,... , \V\}, Cij = 

In fact the constraint Cij represents the atomic relations which are allowed 
between the two n-blocks represented by the variables Vi and V) . Moreover, we 
note that the 1-block networks correspond to the interval networks of lA. Let 




{ (m,eq,p),(m,oi,p),(m,eq,o),(m,oi,o),(m,eq,m) ) 



Fig. 4. A weakly path-consistent strongly-preconvex 3-block network. 



N = {V, C) be a n-block network: 



A Tractable Subclass of the Block Algebra 



85 



— Af is consistent when there exists a mapping m of V to the n-blocks such 

that, for i,j G 1 ,... the two n-blocks rrii and rrij satisfy an atomic 

relation p which belongs to the constraint Cij. We will denote the atomic 
relation satisfied between rrii and rrij by rriij and we will call such a mapping 
TO, a satisfying instanciation of Af. Moreover, when dim{mij) = dim(Cij), 
for all t, j G 1, . . . , |y|, we will say that to is a maximal instanciation of Af . 

— Af is minimal when , for f , j G 1 , . . . , | | , and for each atomic relation p G Cij , 

there exists a satisfying instanciation m of Af such that rriij = P- 

— Af is path-consistent when, for every i,j,kG 1, . . . , |M|, Cij ^ {} and Cij C 
Cik C]^j . 

— We introduce the concept of weakly path-consistency in the following way: 

Af is weakly path-consistent when, for every i,j,k G 1,... Cij {} 

and Cij C I (Cik o Ckj). A path-consistent n-block network is weakly path- 
consistent, but the contrary is not always true. 

— Af is convex (respectively saturated, weakly-preconvex, strongly-preconvex) 
when, for every i,j G 1,... , |k^|, Cij is a convex (respectively saturated, 
weakly-preconvex, strongly-preconvex) relation of 2®" . 

Example 9 For example, in figure is represented a S-block network Af = 
(y, C) which is strongly-preconvex, not saturated and weakly path- consistent. We 
note that Af is not path- consistent. For example, C 12 2 Ci3 ° ^^32 but C 12 C 
I(Ci 3 oC 32 ). We do not represent the constraint Cu for each variable Vi, and the 
constraint Cij when the constraint Cji is already represented. The convex 3-block 
network Af offigure\^is a path- consistent convex 3-block network. Moreover, the 
instanciation of fgure\^ is a maximal satisfying instanciation m of the network 
Af . The atomic relations of Af which are satisfied are underlined, see figure\^ 

The problem of knowing whether a n-block network is consistent is very impor- 
tant. Indeed, a n-block network represents coherent information if, and only if, 
it is consistent. Unfortunately, this problem is NP-complete, but we will see that 
for some subsets of 2®" this problem is polynomial. A decision method for the 
issue of the consistency of a n-block network is the path-consistency polynomial 
method ( 11191 ). Given a n-block network Af = (V,C), this method consists in 
successively replacing, for every i,j,k G 1,... ,\V\, the constraint Cij by the 
relation Cij fl {Cik °Ckj), until we obtain the stability of the network. The resul- 
ting network is path-consistent or contains the empty relation, and has the same 
satisfying instanciations as the initial network. This method is sound but not 
complete. The weakly path-consistency polynomial method is slightly different: 
it consists in successively replacing, for every i,j,kG 1, . . . , \V\, the constraint 
Cij by the relation Cij fl I {Cik ° Ckj). It is sound but not complete too. 

Let Af = {V, C) be a n-block network. In the sequel, we will denote by I{Af) the 
network {V , C) defined by: V = V and C[j = I{Cij), for each i,j G 1, . . . ,\V\. 
And we will denote by Afji (with i G 1,... ,n), the 1-block network {V',C) 
defined by: V = V and C'jk = {Cjk)\.i, for each j,k G 1, . . . ,\V\. For example, 
the 3-block network Af of figure |5] is / {Af ) , with Af being the 3-block network 
of figure m We can prove the following properties: 
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Fig. 5. A path-consistent convex 3-block network: M' = I(M). 





Fig. 6. A maximal satisfying instanciation m of the 3-block network M' ■ 



Proposition 11 Let Af be a n-block network, 

— if Af is weakly path-eonsistent then I{Af) is path-eonsistent; 

— if Af is path-eonsistent then Af^i is path-eonsistent (with € 1 . . .nj. 

6 Results of Tractability 

In this section, we are going to prove that the problem of the consistency for the 
strongly-preconvex n-block can be decided through the path-consistency method 
or the weakly path-consistency method. 

Above all, let us study the tractability of the convex n-block networks: 

Theorem 1 LetAf = {V, C) be a eonvex n-block network, ifAf is path-eonsistent 
then: 

(a) Af is a consistent network; 

(b) Af is minimal; 
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(c) Af has a maximal satisfying instanciation. 

Proof. First, let us show that from satisfying instanciations of the n 1-block 
networks {i S 1, . . . ,n), we can build a satisfying instanciation of J\f . Let 
m* be a satisfying instanciation of the network Af^i {i G 1, . . . ,n). Let m be 
the instanciation of Af defined in the following way: for i € 1, . . . ,\V\, mi is 
the n-block such that for j G 1, . . . , n, {mi)ij is the 1-block ml on the axis 
j. For i,j G 1,... ,\V\, mij is the atomic relation of Bn- (mjj,... ,m^j). For 
all fc G 1, . . . , n, mf G (Cij)ik- Since Cij is a convex relation and a fortiori a 
saturated relation, we have (mL, . . . , m” ) G Cij. Hence mij belongs to Cij and 
m is a satisfying instanciation of Af. Moreover, it is easy to see that if, for each 
i G 1, . . . , n, m* is a maximal instanciation of the network Afii, then m is a 
maximal satisfying instanciation of Af . 

Now, from this, let us prove the results (a), (b) and (c). From proposition |11| 
for each i G 1, . . . , n, Af^i is a path-consistent network of A\. Van Beek in [T^ 
proves that such a network is minimal. Consequently, from the first result of 
this proof, we conclude that Af is minimal and consistent. Moreover, Ligozat 
in |H] proves that for each path-consistent convex network of lA there exists a 
maximal satisfying instanciation. So, each network Afji has a maximal satisfying 
instanciation, consequently Af has also a maximal satisfying instanciation. □ 

Hence, the path-consistency method is complete for the problem of the con- 
sistency of the convex n-block networks. We can see that it is also the case with 
the weakly path-consistency method. Now, let us consider the weakly-preconvex 
and the strongly-preconvex networks. From propositions 1 1 IL HI and the previous 
theorem, it follows: 

Corollary 1 Let Af he a weakly-preconvex n-hlock network, if Af is weakly path- 
consistent then there exists a maximal satisfying instanciation of Af . 

Proof. Af is weakly path-consistent, then from proposition [TT] I{Af) is path- 
consistent. Consequently, there exists a maximal satisfying instanciation m of 
I (Af) (theorem . From proposition we deduce that m is also a maximal 
satisfying instanciation m of Af . □ 

Hence, the maximal satisfying instanciation m (see figure [B]) of the network Af' 
(see figure 0 is also a maximal satisfying instanciation of the network Af (see 
figure El). From all this it follows that: 

Theorem 2 The problem of proving the consistency of strongly-preconvex n- 
hlock networks can be decided in polynomial time by means of the weakly path- 
consistency method or the path- consistency method. 

Proof. Let Af be a strongly-preconvex n-block network. Since S is stable for 
the intersection with the convex relations, applying the weakly path-consistency 
method on Af, we obtain a network Af' , which is also a strongly-preconvex n- 
block network, which has the same satisfying instanciations. If Af' contains the 
empty relation then Af' is not consistent, consequently Af is not consistent too. 
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In the contrary case, M' is weakly path-consistent. From the previous corollary, 
it follows that M' has a maximal satisfying instanciation. This instanciation sa- 
tisfies Af too. We conclude that N is consistent. 

Since a path-consistency network is a weakly path-consistency network, we can 
also conclude that the path-consistency method is also complete. □ We 

note that the previous proof is not valid for the case of the weakly-preconvex 
n-block networks (with n > 1). Indeed, W is not stable for the intersection with 
the convex relations of 2®” (with n > 2). 

Hence, to find in a polynomial time a satisfying instanciation of a strongly- 
preconvex n-block network Af = (V,C), we can apply the following method: 

Step 1. We apply the weakly path-consistency method on Af. If Af contains the 
empty relation then Af is not consistent, and we stop here. 

Step 2. We compute the network Af' ={V, C')=I{Af) . 

Step 3. We compute in a polynomial time a maximal satisfying instanciation 
of Af' ■ To this end, for each i,j G 1, . . . , |F|: we select a maximal ato- 
mic relation p of Cb. Then we replace Cb by p, and we apply the 
path-consistency method on Af'. We obtain a network Af" , which is a 
subnetwork of Af' , each constraint of which is uniquely composed by a 
maximal atomic relation of the corresponding constraint in the network 

N'. 

Step 4. We compute an instanciation of Af" , in a polynomial time, by transla- 
ting Af" in the point algebra ( |lfill7IJ ) and by applying the topological 
sort for example. The resulting satisfying instanciation of Af" is also a 
satisfying instanciation of Af . 

Moreover, from a result in El , we can prove that the problem of the consistency 
of the n-block networks restricted to the relations in ^ is also decidable by means 
of polynomial methods. 

7 Conclusion 

For every n > 1, we defined n-block algebra as a set of relations - the block rela- 
tions - together with the fundamental operations of composition, inversion and 
intersection. Adapting the line of reasoning suggested by Ligozat, we introduced 
the concepts of convexity and preconvexity. In this paper, we demonstrated that 
these concepts are preserved by the fundamental operations. Our goal in writ- 
ing the paper was to provide a subclass of n-block networks whose consistency 
can be decided in polynomial time. We now know for certain that the path- 
consistency algorithm as well as the weak path-consistency algorithm constitute 
decision methods for the issue of the consistency of strongly preconvex networks 
whereas we left open the issue of the maximality of this tractable subclass of 
n-block algebra. In other respects, the question also arises as to how the qualita- 
tive constraints we have been involved in could be integrated into a more general 
setting to include metric constraints. 

^ S denotes the closure of S for the operations: intersection, inverse and composition 
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Abstract. It is believed by the scientific community that VCi and VC 2 
are the largest concept languages for which there exists a polynomial 
algorithm that solves the subsumption problem. This is due to Donini, 
Lenzerini, Nardi, and Nutt, who have presented two tractable algorithms 
that are intended to solve the subsumption problem in those languages. 
In contrast, this paper proves that the algorithm for checking subsump- 
tion of concepts expressed in the language VC 2 is not complete. As a 
direct consequence, it still remains an open problem to which computa- 
tional complexity class this subsumption problem belongs. 



1 Introduction 

In the seminal paper that describes the KL-ONE system [4j, Brachman and 
Schmolze defined such a clean and intuitive declarative semantics that its basis 
had worldwide acceptance. As a matter of fact, they almost fixed the declarative 
semantics of concept languages. All the same, no sound and complete algorithm 
for computing any of the reasoning services was proposed. Besides, almost at the 
same time, Brachman and Levesque directed the community’s attention to the 
tradeoff between expressiveness and tractability in knowledge representation and 
reasoning (nm), showing that the computational complexity of subsumption 
in two very similar sublanguages of KL-ONE (named T CT and TC) belonged 
to different classes. As a result, the community started tackling these problems 
and many concept languages have been created and studied ever since, in or- 
der to understand which constructors or combinations of constructors lead to 
intractable or, even worse, undecidable inference problems. 

As we have already mentioned, the first surprising result was due to Brach- 
man and Levesque, which showed that adding range restrictions to the language 
TC~ leads to intractability of subsumption. But the main results on undeci- 
dability came a few years later. First, Schild m) proved that the problem of 
checking subsumption between two roles is undecidable in the language U. Then, 
Patel-Schneider m) and Schmidt-SchauB m) showed that subsumption is 
also undecidable in the systems NIKL m) and KL-ONE (0), respectively, 
where role value maps and agreements may be applied to role chains. 

After these unpleasant results, a lot of concept languages have been created 
and studied in order to identify the constructors or combinations of constructors 
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that cause reasoning problems to be intractable or undecidable ( I1I5110I14I151 
IT6TT7I ). For this reason, many languages are very similar, sometimes differing 
in just one constructor. One of the most significant examples is the so-called 
family of AC languages ( I2I6I7I9I12I19I22I ). Each element of this class is obtai- 
ned by adding some constructors to the language AC, which acts as a basis. 
For instance, ACU corresponds to adding concept unions, ACM corresponds to 
adding number restrictions, and ACCTZ is AC with arbitrary complements and 
role conjunctions. 

An interesting question concerns the optimal tradeoff between expressive 
power and computational complexity. It has been studied by Donini, Lenzerini, 
Nardi, and Nutt (i). who have shown that, on the one hand, subsumption is 
tractable in the languages VC\ and VC 2 and, on the other hand, “none of the 
constructs usually considered in concept languages can be added to VCi and VC 2 
without losing tractabilitif' (in j^, page 458). 

Since then, it is believed by the scientific community that V C\ and V C 2 are 
the largest languages for which there exists a polynomial algorithm that solves 
the subsumption problem. The main aim of this paper, however, is to prove that 
the algorithm presented in |S] for checking subsumption of concepts expressed 
in the language VC 2 is not complete. 

This paper is organized as follows. In Section[^ we present the language VC 2 
and recall the definition of the subsumption problem. Then, Section gives a 
brief overview of the algorithm introduced in [H] for solving the subsumption 
problem in VC 2 and, in Section 0] we show, through an example, that this 
algorithm is not complete. Finally, Section O contains some concluding remarks. 

2 The Language V C< 2 . 

In this section we present the language VC 2 , the coherence and the subsumption 
problems, and a few basic notions. 

Let (Cone, Rel) be a pair of disjoint sets of primitive concepts and primitive 
relations. In the sequel, we assume that the letter A denotes a primitive concept 
and the letter P denotes a primitive relation (i.e, A G Cone and P G Rel). In 
VC 2 , a concept C and a relation (or role) R are defined by the following abstract 
syntax rules: 



For the sake of conciseness, concepts in VC 2 will be sometimes referred to as 
7^£2-concepts. 

An interpretation of VC 2 over (Cone, Rel) is a pair / = where 

is a nonempty set, called the interpretation domain, and C is the interpretation 
function which satisfies the two following conditions: 

— A^ C A^ , for every A G Cone; and 

— C A^ X A^, for every P G Rel. 
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The interpretation function is extended to arbitrary concepts and relations as 
follows: 

(Cl n Ca)^ = C( n Ci 

(yR.Cy = {a& \ ■. {a,b) G 

{3RY = {a G A^ \ 3b G A^ : {a, b) G R^} 

{R-y = {(a, 6) gA^xA^I {b,a) G R^} 

(i?ini?2)^ = R{r\Ri 

(i?i o = {(oj c) G A^ X A^ I 36 G A^ : (a, b) G R{ and (6, c) G R^} . 

In order to simplify the rest of the paper, we shall assume that the inverse 
operator is only applied to primitive relations. Notice that this does not affect 
the expressive power of the concept language, due to the following equalities, 
which hold for every relation R and every interpretation /: 

= rY 

((i?i n R 2 )-y = (i?r^ n Ryy and 
{{RioRy-y = {R-^oRyi . 

An interpretation / is said to be a model of a concept C if and only if C^ yf 0. 
Besides, a concept is coherent (or satisfiable) if it has a model, and is incoherent 
(or unsatisfiable) otherwise. Now, let Ci and Ca be two concepts. We say that 
Cl subsumes Ca if and only if C| C Cf, for every interpretation /, and that Ci 
and Ca are equivalent if and only if C/ = C|, for every interpretation /. Given 
a concept C, the coherence problem consists in checking whether C is coherent. 
Similarly, given two concepts Ci and Ca, the subsumption problem consists in 
checking whether Ci subsumes Ca. 

In many concept languages, there is another sort of structured concepts, 
called negations or complements, whose form is ->C. For every interpretation I, 
one defines: 

{-^cy = Ayy. 

When conjunctions and general complements are allowed, the subsumption pro- 
blem can be reduced to (the complement of) the coherence problem, because 
a concept Ci subsumes a concept Ca if and only if the concept -iCi 13 Ca is 
incoherent. 

3 Checking Subsumption in 7^/12 

Now, let us give a brief overview of the algorithm described in |8| for checking 
subsumption of P£a-concepts. 

Although the concept constructor -i is not available in P£a, the algorithm 
relies on the equivalence 



Cl subsumes Ca iff ~<Ci 3 Ca is incoherent. 
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checking the coherence of some concepts that actually do not belong to the 
language. For the sake of simplicity, negated concepts are assumed to be inter- 
section-free, which means that the concept constructor n does not occur in any 
concept of the form -iC. The idea is to transform the first concept Ci into a 
conjunction Cn n • • • n Ci„ of n intersection-free concepts and then to solve n 
coherence problems: 

-■Cife 11(72, for /c = 1, . . . ,n. 

The justification for this procedure follows directly from Proposition [T] (whose 
easy proof is left out). 

Proposition 1. The two following propositions hold. 

— A concept Vi?.((7i n C 2 ) is equivalent to the concept (Vi?.(7i) □ (Vi?. (72). 

— A concept □ • • • □ (7i„) □ C 2 , where n>l, is incoherent if and only if 

all concepts ~<Cik H C 2 are incoherent, for k = 1, . . . ,n. 

To cope with negations (of primitive concepts, universally quantified con- 
cepts, or existentially quantified concepts), Donini and his colleagues have con- 
sidered another concept language, VL 2 ■> which extends VC 2 - A concept in VC 2 
(sometimes called a P£^-concept) is a concept of the form: 

(72, ~'Ci, or -i(7i n (72, 

where C 2 is a P£ 2 -concept and C\ is an intersection-free P£ 2 -concept. 

The algorithm for checking coherence deals with constraint systems. So, our 
next step is to recall this notion (introduced in |22JL 

Let Var be a set of variables, disjoint from Cone and from Rel. We assume 
that the letter x will always denote a variable. A constraint a is an expression 
of the form x \ C or R{x\,X 2 ), where x, X\,X 2 S Var, (7 is a concept, and i? is a 
relation. A constraint system A is a finite and nonempty set of constraints. 

Given an interpretation I over (Cone, Rel), an I-assignment a is a function 
from Var to . Moreover, an /-assignment a satisfies: 

— a constraint x : (7 iff 0 !(a;) S (7^; 

— a constraint R{xi,X 2 ) iff (a(xi), 0 (^ 2 )) G R^; and 

— a constraint system A iff a satisfies every constraint a G S. 

A constraint A is said to be satisfiable if there is an interpretation I and an 
/-assignment a that satisfies A, and unsatisfiable otherwise. The usefulness of 
constraint systems is due to the next result, proved in |22| . 

Proposition 2. Let C be a concept and x be a variable. Then, C is coherent if 
and only if the constraint system {x : (7} is satisfiable. 

Roughly speaking, the algorithm for checking coherence of P£^-concepts 
starts with a constraint system of the form S — {x : C}, into which it adds 
new P£j-constraints, until no more constraints can be generated. But before 
presenting the rules for generating constraints, we need to introduce another 
definition. 

A constraint R{x\,X 2 ) holds in a P£ ^-constraint system A if and only if: 
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— i? is a primitive relation P and P{xi,X2) £ S', or 

— R has the form P~^ and P{x2,xi) G S', or 

— P has the form Ri □ i?2 and both Ri{xi,X2) and R2(xi,X2) hold in S', or 

— R has the form i?i o i?2 and there is a variable X3 such that both X3) 

and R2(x3,X2) hold in S. 

The following six rules are called completion rules, where H is a VC,2~ 
constraint system. 



(□c-rule) 


If X : 


Cl n C2 G If, 






and either x : Ci ^ If or x : C2 


^s, 




then 


If^IfU{x:Ci,x:C2}. 




(V-rule) 


If Xi 


: Vi?.C G If, i?(xi,X2) holds in If, 




and X2 '■ C ^ S , 






then 


If ^ IfU{x2 : C}. 




(-■V-rule) 


If Xi 


: -nWR.C G If 






and there is no variable x such 


that X : ->C G If, 




then 


t 

C 

to 

to 


-iC}, where X2 is a new variable. 


(— I-rule) 


If P- 


■^(xi,X2) G If 






and P{x2, Xi) ^ If, 






then 


If ^ If U {P(x2, a^i)}- 





(rij.-rule) If (i?i n i?2)(xi, X2) G If, 

and either Ri{x\,X2) ^ S or R2{x\,X2) ^ S, 
then 17 ^ If U {i?i(xi, X2), i?2(a^i, X2)}. 

(o-rule) If (i?i o i?2)(a;i, X2) G If 

and there is no variable x such that {Ri{xi, x) , R2{x, X2)} C If, 
then S ■(— S U {i?i(xi, X3), i?2(a;3, X2)}, where X3 is a new variable. 

Let If be a PC2 -constraint system. A completion of If, often represented by 
If, is a superset of S obtained by applying the completion rules, by any order, 
until none of these rules can be applied. 

Completions of P£^-constraint systems verify two important properties. 
First of all, they are ^-constraint systems. So, in particular, they are fi- 
nite sets. Second, completions preserve the satisfiability of the initial constraint 
system. Therefore, we may state the following proposition. 

Proposition 3 . Let S he a VC2 -constraint system and S be a completion of 
S. Then, S is satisfiable if and only if S is satisfiable. 

The last question is how to detect (un)satisfiability of ^-constraint sy- 
stems. The usual proceeding is to define the notion of clash, which corresponds, 
intuitively, to the notion of “obvious” contradiction. In the work reported in [^, 
a clash is defined as a set of constraints of the form: 

— {x : A,x : -•A}; or 
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— {x : 3 Ri,x : 

provided that the completion of the constraint system 



{Rx{x,xi),x : Vi?2-^} 

has a constraint of the form X 2 '■ A — where X\ is a new symbol that denotes 
a variable, >1 is a new symbol that denotes a primitive concept, and X2 is a 
variable. 

We may finally present a key result of [8], which may be rephrased as follows. 

Claim (Theorem 5.2 of Let Ci be an intersection- free 7^£2-concept, C 2 be 
a P£ 2 -concept, and H be a completion of the ^-constraint system 

{x : -Cl n C2}. 

Then, S is satisfiable if and only if S does not contain any clash. 

4 The Incompleteness of the Algorithm 

We will prove that the algorithm presented in the previous section is not complete 
by means of a simple example. So, let us consider the two following concepts: 

Cl = 3(Pi o P 2 ) and C 2 = (3Ti) 13 (y P 1 . 3 P 2 ) . 

In the first place, let us verify that Ci subsumes C 2 . To this end, let I be 
an interpretation and a be an element of such that a G C|. By definition 
of interpretation, a G (3Pi)^ and a G (VPi.3P2)^, and, in particular, there 
must be an element b G such that (a, &) G P( . But, then, b G (3P2)^ , 
which implies that there is an element c G A^ such that (b,c) G P/. Therefore, 
(a, c) G (Pi o P 2 Y , allowing us to conclude that a G (3(Pi o P 2 ))Y i.e., that 
a G Cf, as we wanted. 

Now, let us examine how the algorithm described in the previous section 
would work with the concepts Ci and C 2 . In this case, since Ci is an intersection- 
free concept, the subsumption problem would be reduced to the problem of 
testing if the following constraint system is unsatisfiable (where a: is a variable) : 

S = {x: (- 3 (Pi o P2)) 3 (( 3 Pi) 3 (VP1.3P2))}. 

Let us then compute a completion E oi E. Remark that (3c-rule) is the only 
rule that can be applied, giving rise to the constraints: 

a: : -i 3 (Pi o P2) and a; : ( 3 Pi) 3 (VP1.3P2), 

on its first application, and to: 

X : 3Pi and x : VP 1 . 3 P 2 , 

on the second. That is, E has only five constraints. 
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In order to test if S has a clash, we must analyse the set 
{x : 3Pi,x : 0P2)}, 

and compute a completion S' of the constraint system 



S' = {Pi{x,xi),x :y{Pio P2).A}. 



But S' = S' because no rule can be applied to S' and, in particular, no con- 
straint of the form x' : A belongs to S'. Therefore, S contains no clash and, 
according to Claim one may conclude that S is satisfiable. Consequently, by 
Proposition [3 S = {x : -iCi □ C 2 } is satisfiable and, by Proposition!^ -iCi □ C 2 
is coherent, which is equivalent to state that Ci does not subsume C 2 . 

The conclusion is that Claim (Theorem 5.2 of [8]) is not true. The main 
issue, however, is that we are lead to believe that Theorem 5.3 of [8|, which 
states that the subsumption problem in PC 2 can be solved in polynomial time, 
is not true either. The point is that the usual rule for dealing with existentially 
quantified constraints seems to be unavoidable and, as it was pointed out by 
Donini and his colleagues, “even if one generates only one variable for each 
constraint of the form x : 3R (as done in TCP ), the presence of role conjunction 
may lead to an exponential number of variables in the completion of the system” 
(in 0, page 462). 

5 Conclusion 

This paper aims at proving that the tractable algorithm presented in [Sj for 
solving the subsumption problem in PC 2 is not complete. To this end, a concrete 
example for which the algorithm fails to detect concept subsumption has been 
introduced and examined in detail. This result was also achieved in | 23| . where 
a less simple example was used. 

Concerning future work, we plan to determine the computational complexity 
class to which the subsumption problem in VC 2 belongs. Taking into account 
this new result, and as far as we know, this problem remains open. 
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Abstract. This paper explains how a form of picture retrieval can be 
used to allow a document generation system to include pictures in gene- 
rated documents. The exposition is based on the What You See Is What 
You Meant (wysiwym) approach to document generation [15,16]. The 
approach outlined in the paper makes use of a library of pictures, each 
of which is associated with a WYSiWYM-generated set of logical represen- 
tations that formalize the intended meaning of the picture. The paper 
focuses on the inclusion of picture sequences into generated documents, 
and on coreference relations between different pictures in a sequence. 



Keywords: Document Generation, Multimedia Coreference, Formal Semantics 

1 Introduction 

Research on Document Generation has started to involve more than language 
generation alone, focusing not only on putting information into words, but also 
on putting information into pictures and laying out the words and pictures on 
a page [13]. Some of this research has focused on pictures that are themselves 
generated from smaller components, making use of a simple compositional se- 
mantics (e.g. [21]). In many practical applications, however, a purely generative 
approach is not feasible. The present investigation will take an application of 
this kind as its point of departure. The documents generated are pharmaceu- 
tical Patient Information Leaflets (pills), as exemplified by the leaflets in [1], 
around 60% of which contain pictures. 

Many pictures in the pills corpus (e.g. photographs of medicine packages) are 
so complex that it would be difficult to subject them to a compositional analysis 
and to generate them from smaller parts. Even where generation is possible, it 
is not practically feasible because of the sparseness of the pictures occurring in 
the corpus. (Gompare [18] for analogous arguments in connection with language 
generation.) Each picture can be thought of as belonging to a family of pictures 
that are sufficiently similar to be generated by the same grammar. One family 
consists of pictures of medicine packages, another consists of pictures of body 
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parts, etc. The problem is that the number of families in the corpus is large, 
while each of them has only a small number of elements, which would force one 
to build a large number of different picture generators, each of which would be 
called only infrequently. 

Luckily there also seems to be little need for genuine generation in this area, 
since the total number of different pictures used in the pills produced by any 
one pharmaceutical company is limited. In addition, most pictures are used in 
several leaflets. This suggests that it would be attractive to let a document ge- 
neration program select pictures from a library, in which each picture is coupled 
with a formal representation to characterize what the picture intends to convey. 
One way in which this setup can be exploited is by allowing an author to specify 
the content of each picture as well as each leaflet and to determine what parts of 
the leaflet are in need of pictorial illustration. (Other approaches, some of which 
grant more of the initiative to the system, are briefly discussed in Section 5.) 
This idea, first introduced in [4] and [5], will be further explored in the rema- 
inder of this paper. The focus will be on how to represent and include coherent 
sequences of pictures, in such a way that coreference relations between different 
pictures in the sequence are reflected properly. 

In Section 2, the ‘What You See Is What You Meant’ (wysiwym) approach 
to knowledge editing and document generation is introduced. In Section 3, we 
sketch how wysiwym is being extended to allow the inclusion of pictorial infor- 
mation in a generated document. Section 4 discusses how this basic scheme may 
be applied to sequences of pictures. Section 5 discusses a number of extensions 
of the method described in earlier sections. Section 6 draws conclusions. 

2 WYSIWYM for Text Generation 

Elsewhere ([15,16]), a new knowledge-editing method called ‘wysiwym editing’ 
has been introduced and motivated. Wysiwym editing allows a domain expert 
to edit a knowledge base (kb) reliably by interacting with a feedback text, ge- 
nerated by the system, which presents both the knowledge already defined and 
the options for extending and modifying it. Knowledge is added or modified by 
menu-based choices which directly affect the knowledge base; the result is dis- 
played to the author by means of an automatically generated feedback text: thus 
‘What You See Is What You Meant’. Wysiwym instantiates a general recent 
trend in dialogue systems towards moving some of the initiative from the user 
to the system, allowing such systems to avoid ‘open’ (i.e., unconstrained) input. 

Various applications of Wysiwym are currently being explored. The present pa- 
per is concerned with applications of wysiwym to document generation: The kb 
created with the help of wysiwym is used as input to a natural language gene- 
ration (nlg) program, producing as output a document of some sort (e.g., a user 
manual), for the benefit of an end user. Applications of wysiwym to document 
generation combine a KL-ONE-type knowledge representation language with two 
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NLG systems, implemented as two different modes of the same generator. One 
NLG system generates feedback texts (for the author) and the other output texts 
(for an end user). One wysiwym application that is currently under develop- 
ment has the creation of Patient Information Leaflets (pills) as its domain. The 
present, limited version of this system allows authors to enter information about 
possible side-effects of taking a medicine ( ‘If you are either pregnant or allergic 
to penicillin, then tell your doctor) and about how to handle medical devices 
such as inhalers, inoculators, etc. It is this ‘pills’ version of wysiwym that we 
will have in mind in the remainder of this paper, so let us sketch its outlines. 
By interacting with the feedback texts generated by the system, the author can 
define a procedure for performing a task, e.g. cleaning an inhaler. When a new 
knowledge base is created, a procedure instance is created, e.g. prod. The per- 
manent part of the kb (i.e., the T box) specifies that every procedure has two 
attributes: a goal, and a method. This information is conveyed to the author 
through a feedback text: ‘Achieve this goal by applying this method . Not yet 
defined attributes are shown through anchors. A boldface anchor indicates that 
the attribute must be specified. An italicized anchor indicates that the attribute 
is optional. All anchors are mouse-sensitive. By clicking on an anchor, the aut- 
hor obtains a pop-up menu listing the permissible values of the attribute; by 
selecting one of these options, the author updates the knowledge base. 

Clicking on this goal yields a pop-up menu that lists all the types of actions that 
the system knows about: e.g., clean, store, dry-in-air, inhale, etc., from which 
the author selects ‘clean’. The program responds by creating a new instance, of 
type clean, and adding it to the kb as the value of the goal attribute on prod: 

procedure (prod) . 
goalCprod, cleEuil) . 
clean(cleanl) . 

From the updated knowledge base, the generator produces a new feedback text 
Clean this device or device-part by applying this method. 

including an anchor representing the undefined Actee attribute on the cleanl 
instance. By continuing to make choices at anchors, the author might expand 
the knowledge base in the following sequence: 

— Clean this device or device-part by applying this method. 

— Clean this device or device-part by performing this action {further 
actions). 

— Clean this inhaler by performing this action {further actions). 

— Clean this inhaler by rinsing this device or device-part {further actions). 

— Clean this inhaler by rinsing it {further actions). 

At this point the knowledge base is potentially complete (no boldface anchors 
remain), so an output text can be generated and incorporated into the pill 
leaflet. For example, ‘To clean your inhaler, you may rinse it’. By expanding 
or otherwise changing (e.g. cutting and pasting) the feedback text, other, more 
interesting output texts can be obtained. 
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3 An Extension: WYSIWYM for Text Plus Pictures 

We will sketch an extension of text-based wysiwym systems in which wysiwym 
is employed to create output documents that contain pictures as well as words. 
Following [4,5], a number of simplifying assumptions will be made: (1) Pictures 
never express truthconditional information that is not expressed in the text; (2) 
The author (rather than the system) decides what parts of the kb are in need of 
pictorial illustration; (3) The author indicates his/her decision by highlighting a 
span of output text that describes one single action. These simplifying assump- 
tions will be relinquished in the discussion of Section 5. 

The idea is to allow authors to indicate, for a given (mouse-sensitive) stretch s of 
text, whether or not they would like to see s illustrated. If yes, then the system 
will search its pictorial library to find a picture that ‘matches’ the meaning of 
s. Determining this on the basis of the pictures alone would be extremely diffi- 
cult. It would be possible to index the pictures making use of keywords, possibly 
making use of an existing classification scheme such as Iconclass [20]. What we 
propose is to use a more precise, logically oriented classification scheme in which 
logical formulas represent the intended meaning of each picture. Determining 
the intended meaning of a given picture can be very difficult, as is well known 
[6], but we will make use of the following ‘Closed World’ Assumption: 

If the truth of a proposition p is implied by the picture and p is expres- 
sible by the knowledge representation language (defined by the T box), 
then p is part of the semantic representation of the picture. 

This implies, for example, that occluded objects (e.g., parts of the face that are 
hidden by a hand) will be represented formally if the picture is normally inter- 
preted as implying their existence. Missing objects (e.g., the cap of an inhaler, 
if the picture shows an inhaler without a cap) will not be represented. 

Given a representation in the kb, the system will try to find pictures in the 
library that match the representation. If a nonempty set of matching pictures is 
found, one picture is chosen randomly from this set. (Alternative, all matching 
pictures may be shown, from which the user can make a selection.) If the set 
is empty, the system tells the author that no illustration is currently possible, 
which can be a reason for expanding the library. A simple system with this fun- 
ctionality has been implemented. Let us look at an example, adapted from a 
leaflet describing the ‘Zovirax’ cream [1]: 

1 . You may feel a tingling sensation on your lip. 

2 . A blister may develop there. 

3. Apply some cream to the blister. 



This leaflet exemplifies the fact that pictures often come in sequences, each 
member of which illustrates a textual clause presented in an ordered list. Clause 
1 is illustrated by an image that tries to depict the occurrence of the sensation. 
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Fig. 1. Illustration for clause 1 



Clause 2 is illustrated by a picture showing a blister location where the sensation 
occurred (Fig. 2), while clause 3 is illustrated by a picture showing the cream to 
be applied to the same location (Fig. 3). It has been observed that photographic 




Fig. 2. Illustration for clause 2 




Fig. 3. Illustration for clause 3 



pictures express ‘vivid’ information and that such information can be expressed 
by a conjunction of positive literals [12]. In what follows, we will use a notation 
that matches this observation, using a fragment of predicate logic that is easily 
translated into the semantic networks used in existing wysiwym systems. Thus, 
we will write ^{x, y) in the representation associated with a picture to assert that 
there are x and y such that '•p{x, y) is true. Thus, all variables are interpreted as 
if they were governed by an existential quantifier taking scope over (at least) the 
entire representation. Assume Ss is the part of the database for which a pictorial 
illustration is requested. The representations in the kb can be rendered in logical 
notation as follows: 
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typeoie) & 

rolei(e) = & ... & rolen{e) = a;„ & 

typei(xi) & ... & type„{xn), etc. 

where each of e, xi, .., is either a variable or a constant. This notation reflects 
the reliance, in the semantic nets used in [15, 16], on instances, types, and at- 
trihute/value pairs. Each instance has one (1-ary) property, called its type, and 
can be the argument of any number of attributes, whose values are instances 
again. Instances are rendered as variables or constants, while types are denoted 
by the predicates typep, attributes are denoted by the functions rolci (e.g. Actor, 
Actee, Target, as in case-based approaches to grammar, e.g. [8]. In the case of 
an action e, its type, typeo, corresponds roughly with the meaning of a verb, 
saying what kind of action e is; The values of these attributes {xi in the formula 
above), each of which can be either a variable or a constant, can be of any type 
type^ (e.g., a person, a medicine, or even an action) and each of them can have 
other attributes, and so on. 

Let us return to our example to see how the semantic content of the pictures 
and that of the text may be expressed formally. The texts, stripped of modal and 
temporal operators, express something along the following lines. (‘Exp’ abbre- 
viates ‘Experiencer’; the lips are left out of the representations for simplicity.) 

1. Tingle(ei) & Exp(ei) = pi & Reader(pi) 

2. Develop(e2) & Result(c2) = 62 & Blister(62) & Exp(c2) = pi 

3 . Apply(c3) & Actor(c3) = pi & Actee(c3) = C3 & Cream(c3) & Target(c3) = 62 

The third and most complex representation, for example, says that there is an 
event of type Apply, whose Actor is of type Reader, whose Actee is of type 
Cream and whose Target equals the instance of type Blister in 2. (Different 
occurrences of the same variable accross representations are understood to be 
bound by the same existential quantifier.) The content of the pictures can be 
formalized as follows. (A more accurate formalization follows in Section 4. Note 
that ‘CreamOrOint’ is a disjunctive predicate which is true of both Creams and 
Ointments.) 

a. Tingle(e') & Exp(e') = p' & Person(p') 

b. Develop(e") & Result(e") = 6" & Blister(&") & Exp(e") = p” & Person(p") 

c. Apply(e"') & Actor(e"') = p'" & Person(p"') & Actee(e"') = c"' & 
CreamOrOint (c"') & Target(e"') = d'” & Blister(d'"). 

Observe, firstly, that the two kinds of representations - the ones for the texts 
and the ones for the pictures - are highly similar. They can be viewed as expres- 
sions of one formal language, over the expressions of which a notion of logical 
consequence has been defined and which can be created by means of the same 
WYSIWYM mechanism. Note secondly, however, that the representations of text 
contain some types of information that the representations of pictures do not 
contain. For example, the pictures - unlike the texts - do not say that the Actor 
of c equals the Experiencers of a and b. Also, the distinction between a Cream 
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and an Ointment is not made. This is as it should be, since the missing informa- 
tion cannot reasonably be extracted from the pictures by looking at them. (For 
example, creams and ointments look very much alike.) This phenomenon, which 
has been dubbed pictorial underspecificity, has been argued to imply that the 
following rule should be used to find a picture that matches a given part of the 
KB: 



Rule B: Use the logically strongest picture whose representation is lo- 
gically implied by Ss. 

Logical strength is determined on the basis of the representations alone. (As 
usual, (fi will count as being logically at least as strong as f/' *jff logically 
implies tp-) Determining whether ip logically implies ip, where each of the two 
formulas is either an instance in the kb or a semantic representation of a picture, 
is not difficult, given the fact that both are conjunctions of positive literals. 
Crucially, Rule B does not require that all the relevant information is expressed 
by the picture, as long as no additional information (i.e., information not in Ss) 
is expressed. Arguments for the appropriateness of Rule B are provided in [4]. 

Conversely, it is well-known that picture can sometimes be overepecific in that 
they depict things that are not part of their intended meaning. In our example, 
for instance, the pictures are ‘forced’ to show on what part of your lips the 
tingling feeling occurs, even though this is immaterial to the instructions 1-3. 
The same pictures, however, may also be used in a situation where the location 
does matter. (For example, it may be relevant whether it occurs on the upper 
or on the lower lip.) As has been argued in [4], this can make it necessary for 
the library to contain several representations for one picture, some of which are 
more specific than others. This issue will not be further pursued here. Instead, 
we will concentrate on issues of document coherence. More specifically, we will 
explore the suitability of our representations for the expression of coreference 
relations between pictures. 



4 Pictorial Coreference 

Coreference relations between a picture and either another picture or a Noun 
Phrase in the textual part of the document have been discussed in a number 
of publications related to the wip project [2], where it was pointed out that 
it is crucial, for the readability of the document, that a reader grasps these 
coreference relations. It follows that, to express coreference relations, the system 
has to employ pictures that are appropriate for this purpose. In the case of our 
example, the reader has to grasp that the expression ‘the blister’ corefers with 
(i.e., has the same referent in the real world as) the depiction of a blister in 
Fig. 2. In the system currently under development, this coreference relation is 
established as a result of the fact that the picture in Fig. 2 comes to illustrate 
clause 2 of the example: the inference leading from the truth of the representation 
(2) of the clause to that of the representation (b) of the picture is achieved by 
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unifying 62 with e", pi with p" and 62 with 6", and the latter equality encodes 
the coreference relation between the blister in the kb and that in the picture. For 
example, when the pictures in Figs. 1-3 have been used to illustrate the textual 
representations 1-3, a side effect is that the following equalities are recorded: 

ei = e', 62 = e", 63 = e"', 63 = c'", 

62 = h" = d'”,pi =p' = p" = p'". 

We will call these ‘resultative’ equalities, because they are the result of the act 
of illustration. Where does this leave cross-picture coreference relations? Some 
coreference relations are a consequence of purely textual coreference; for example, 
the identity of b" and d”' follows from the identity of the blisters mentioned in the 
first two clauses. Analogously, the equality of p' , p” , and p'" can be inferred. But 
this is not always the case. Let us focus on locations. It is crucial to the meaning 
of the leaflet that the following locations depicted in Figs. 1-3 are equal: (1) the 
place of the tingling sensation, (2) the place where the blister develops, and (3) 
the place where the cream must be applied. How can we make sure that the 
representations of Figs. 1-3 reflect these equalities? Suppose variables I", I'" 
for locations were added to the representations a-c then, in principle, the same 
procedure could be applied as was described in connection with the blisters. Let 
a variable, say I, be shared between the representations for the three textual 
clauses: 

1'. 1 & Location(ei) = I 

2'. 2 & Location(c2) = I 

3'. 3 & Location(c3) = 1. 

Then each of I', I", I”', in the representations of pictures, could illustrate I, 
and the equality of I', I" and I'" would arise as a consequence of their equality 
with 1. But without further modifications to the system, this approach would 
not give the right results, because nothing in the new representations of the 
pictures guarantees that the locations I” , I'” in them corefer. Suppose, for 
example, the library contains another picture which is just like Fig. 3, except 
that it shows a location V'” somewhere else on the face being treated. This should 
prevent I"" from coreferring with I' , I", I'” but, in fact, it does not. (Recall that 
the fact that l'-3' used three different location variables did not prevent the 
pictures associated with them from illustrating the same location 1.) What is 
needed is a way of expressing, in the representations of two pictures, that they 
are guaranteed to refer to one and the same ‘entity’. In the case of the present 
example the entity is a location, but other types of entities can be involved as 
well. For example, one would like to avoid letting one and the same person in 
the KB be illustrated by means of two pictures, one of which depicts a man and 
the other a woman. The solution to this problem that will be sketched here, 
which hinges on allowing variable sharing between semantic representations of 
pictures, applies irrespective of the type of entities (locations, medicines, people, 
etc.) involved. 

Suppose one variable, , is employed to refer to the relevant locations in each 
of the three pictures: 
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a', a & Location(e') = 
b'. b & Location(e") = 
c'. c & Location(e'") = 1^. 

Suppose, furthermore, that the library contains a picture, p, as described above, 
depicting a location elsewhere on the face, which has the representation d: 

d. Apply(e"") & Actor(e"") = p"" & Person(p"") & Actee(e"") = c"" & 
CreamOrOint(c"") & Target(e"") = d'"' & Blister(d"") & Location(e"") = 
I'”'. 

Up to now, c' and d are equally suitable to illustrate the action expressed in 
3': in one case, is equated with I, in the other case I”” is equated with 1. To 
enforce a preference for the first option, we introduce a special principle: 

Principle of coherent illustration: If more than one picture fulfils Rule B 
then preference is given to a picture whose inclusion will lead to fewest 
new (i.e., resultative) equalities. 

This principle predicts a preference of c' over d as an illustration for 3', in a 
situation where either a' or b' has already illustrated 1. The less preferred illu- 
stration leads to the resultative equality I — while the preferred illustration 
does not lead to a new resultative equality involving I and Z^, since the equality 
Z = Z"^ has already been recorded. The preferred illustration gives rise to a se- 
quence of illustrations that is coherent in the sense that similar pictures are used 
for the illustration of similar events. The Principle of coherent illustration can be 
viewed as a special case of the idea, traceable to abductive theories of language 
interpretation, that those interpretations are preferred that require one to make 
fewest new assumptions [10]. 

So far (see Section 3), we have been making the simplifying assumption that 
pictures never express truthconditional information that is not already expres- 
sed in the text. Pictures for which this is true - the majority of pictures in the 
PILLS corpus fall into this category - can have several other functions, including 
emphasizing important parts of the leaflet and assisting non-native speakers of 
the language in which the leaflets are written. But clearly, pictures can express 
nonredundant truthconditional information and (approximate) identity of loca- 
tions is a prominent example of this phenomenon. For example, the texts of the 
leaflets tend to omit explicit specification of locations relying on the pictures to 
convey this information. Thus, the actual text of the leaflet corresponding with 
clause 2 of our initial example says 

’’Zovirax Cold Sore Cream can prevent the cold sore from developing. If 
the blister does develop, (■■■)” [1] 

without saying where the blister will develop. Such ‘picture -I- text’ combinati- 
ons, where coreference relations between pictures make possible a simplification 
of the text, would be a natural extension of the class of documents generated 
by a WYSiWYM-type document generation system. Note that the feedback text 
would still be required to express all the relevant information in the kb, to allow 
selection of the information to be illustrated. 
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5 Alternatives and Extensions 

The ‘pictorial’ extension of the pills system described here aims to provide 
insight into the way in which pictures convey information and offers a basis for 
experimenting with various extensions to optimize the usefulness of the interface. 
A number of extensions are currently being explored: 

— Adapting nl text. The last paragraph of the previous section contains a case 
where the output text was adapted as a result of information expressed by 
pictures: the locational information expressed by the latter was substrac- 
ted, as it were, from the information to be expressed by the text. This idea 
could be generalized to cover, for example, cases in which a picture contains 
information about sizes or non-discrete quantities. For example, if the kb 
describes a certain quantity of cream, then the text may express this simply 
as a ‘blob’ of cream, while the picture shows its approximate size. A type of 
case where text needs to be adapted as a result of pictures arises when pictu- 
res are referred to by text (e.g. ‘Hold your inhaler as shown in the picture^). 
A Ph.D. project on the generation of references to text and/or pictures has 
recently started at itri. 

— More system initiative. As was pointed out in the Introduction, we have in- 
itially opted for a setup in which the user of the system determines which 
parts of the kb merit illustration. It will be interesting to experiment with 
an alternative setup in which the system determines this on its own initia- 
tive (perhaps triggered by a global request of the user saying, effectively, 
‘Display all relevant illustrations’). In a simple version of this system, all 
pictures that Rule B would select for any part of the kb would be displayed 
as a result. More interestingly, one could add constraints on the set of sel- 
ected illustrations such as, for example, ‘Either illustrate all the actions in 
a given sequence or illustrate none of them’. Ultimately, this approach will 
force us to think through what makes an illustration useful, as opposed to 
merely logically suitable. 

— Selection by thumbnail. If the set of pictures in the library is of manageable 
size (See section 6), it might be useful to display them as thumbnails and 
to offer the user the option of selecting a picture from among them. After 
selection, the system takes over by determining whether the picture is suita- 
ble to illustrate any part of the text - possibly taking the above-mentioned 
constraints into account - and then to find the best place for inserting it 
in the document. In most cases, this will be somewhere close to the place 
in the text where the relevant information is mentioned for the first time. 
(In our corpus, for example, texts about inhalers mention the inhaler many 
times, but illustration invariably happens on first mention only.) To do this 
well, the issue of locating sequences of pictures on one or more pages (e.g. 
[9]) needs to be addressed in detail, which we have not done so far. 
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— Generation of pictures. In the Introduction, it has been pointed out that 
generation - as opposed to selection ~ of pictures is not a practical option in 
the PILLS domain. In other domains, where the pictures are more uniform, 
this can of course be different. A possible example is the domain of maritime 
law, the object of study in the clime project [14] in which the itri is a 
partner, where pictures could be usefully employed to highlight a part of a 
ship. In a domain of this kind, there is clearly room for generation of pictures 
and it would be interesting to investigate whether the methods described in 
the present paper (including, in particular. Rule B) could be employed to 
generate a picture that is optimally appropriate for the illustration of a given 
part of the kb. 

It should be noted that each of these extensions highlights the importance of 
using precise characterizations of the semantic content of a picture. Without 
such characterizations (i.e., without the logical representations, in our case), the 
system is unable to grasp the relation between a picture and either (a) a part of 
the KB or (b) a fragment of text expressing a part of the kb. 

6 Conclusion 

This paper has outlined an approach to document generation that has been 
implemented in a simple prototype system that allows the user of the system 

(1) to use feedback texts to ask for illustration of a part of the kb. Illustrations 
are selected from a library containing all 40 pictures used in the leaflets of 
one pharmaceutical company represented in our corpus. 

(2) to enter new pictures into the library and to create semantic representations 
that characterize the meaning of each of them, so that they can be selected 
by the process described under (1). 

This method makes crucial use of picture retrieval. The resulting retrieval task 
is rather different, however, from the task facing most picture retrieval systems. 
In such systems, where the number of pictures in the search space can be huge 
(sometimes more than 10®), it is often not feasible to create complex indexing 
terms for the pictures [6]. In the case of pills, however, the number of pictures in 
the library of one company is typically less than 100. As a result, more elaborate 
indexing techniques can be used including ones based on formal logic, and it is 
one such technique that has been explored in this paper. (For more elaborate 
discussion of the connection with information retrieval, see [5].) 

Having noted that the creation of complex indexing terms is always a time- 
consuming task, it is important to stress that this task is vastly simplified by 
the fact that wysiwym can be used for it. It will be useful to describe the situa- 
tion in the terminology of knowledge retrieval, based on the notion that, when 
the user of a wysiwym system indicates which part of the kb needs illustration, 
then this can be viewed as the formulation of a query. This means that wy- 
siwym is used for the creation of the controled-term indexing of a multimedia 
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collection (= (2) above) and for the formulation of queries regarding the same 
collection (= (1) above). This procedure has obvious potential advantages as re- 
gards speed and accuracy of indexing - the extent of which are to be established 
by a comparison with other methods in future research - while at the same time 
allowing great richness of logical detail (cf. [19]). In addition, the interface gua- 
rantees that there always is an exact match between the indexing terms and the 
terms in the query, instead of the imprecise matches that were a crucial factor 
in motivating the use of statistical methods in information retrieval [5] . 

So far, we have focused exclusively on document generation ~ and this is the 
only area to which the method decribed in sections 3-5 of this paper has been 
applied - but this is not the only possible application. In particular, what was 
said in the previous paragraph suggests an interesting alternative use that could 
be made of wysiwym in connection with (regular) picture retrieval. There is 
nothing in WYSIWYM that requires the use of complex logical formalism (which 
may be too complex to apply to a large library even in combination with wy- 
siwym). For example, one might use a wysiwym interface in combination with 
the logically simple indexing schemes that are currently used for picture retrie- 
val (e.g. ICONCLASS, [20]). If this is done, wysiwym might be used to achieve 
improvements in the speed and accuracy with which such indexing schemes can 
be used for creating queries and indexing terms pertaining to moderately large 
pictorial libraries. 

It is important to note that, in the approach outlined in this paper, there is no 
need for the system to ‘interpret’ a query. Even in those versions of the system 
where the author selects the parts of the kb that require illustration by selec- 
ting a text span in the document, there is no need to parse the text span: the 
text span has been generated from the kb in the first place and retrieving the 
particular part of the kb that has been verbalized by this text span is trivial. 
Potential ambiguities in the text - no matter how undesirable they might be for 
other reasons - do not pose a problem for the illustration task. 

Much of the present paper has focused on the representation of coreference. We 
have argued that the representation of coreference requires the use of variables 
shared between different representations (or some equivalent device e.g. [17]) in 
the semantic representations for pictures. This fact seems to have been overloo- 
ked so far. The received view on this topic can be traced back at least to the 
discussion between E. Sober and R. Howell, who analyse pictures as formulas all 
of whose variables are existentially quantified, while concatenation of pictures is 
represented by nothing else than logical conjunction [11]. For these authors, a 
concatenation of two pictures will have the form i rather than ii {x is the list of 
variables occurring in either (p or ip): 

i. 3xip & 3xip 

ii. 3x{ip & Ip), 

where 3xp represents the semantic content of one picture and 3xip that of the 
other. Note that ii (but not i) forces variables shared between ip and ip to corefer. 
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It is essentially this property of ii that is exploited in the various formalisms of 
‘dynamic’ semantics, such as Discourse Representation Theory (e.g. [7]), which 
try to capture certain laws governing anaphora in natural language. 

Implementing the approach outlined in this paper in a full-blown document 
generation system requires that a number of issues are resolved that cannot be 
discussed here. For example, images depicting the same state of affairs may differ 
in terms of their size, colour, or pictorial ‘style’. This is of particular importance 
when pictures are part of a coherent sequence, since such pictures are normally 
expected to be similar in size, colour, and style. This problem can be tackled by 
separating the selection process from the presentation process by allowing that 
sets of semantically equivalent pictures (i.e., pictures that have logically equiva- 
lent representations) are selected, while the subsequent choice from among this 
set is carried out by other methods [3], which take other than truthconditional 
aspects of pictures into account. 
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Abstract. The availability of contiguous and non-contiguous multiword lexical 
units (MWUs) in Natural Language Processing (NLP) lexica enhances parsing 
precision, helps attachment decisions, improves indexing in information 
retrieval (IR) systems, reinforces information extraction (IE) and text mining, 
among other applications. Unfortunately, their acquisition has long been a 
significant problem in NLP, IR and IE. In this paper we propose two new 
association measures, the Symmetric Conditional Probability (SCP) and the 
Mutual Expectation (ME) for the extraction of contiguous and non-contiguous 
MWUs. Both measures are used by a new algorithm, the LocalMaxs, that 
requires neither empirically obtained thresholds nor complex linguistic filters. 
We assess the results obtained by both measures by comparing them with 
reference association measures (Specific Mutual Information, , Dice and Log- 
Likelihood coefficients) over a multilingual parallel corpus. An additional 
experiment has been carried out over a part-of-speech tagged Portuguese corpus 
for extracting contiguous compound verbs. 



1 Introduction 

The acquisition of MWUs has long been a significant problem in NLP, being 
relegated to the borders of lexicographic treatment. The access to large-scale text 
corpora in machine-readable formats has recently originated a new interest in 
phraseology. The evolution from rule based formalisms towards lexicalization, that is 
the evolution from “general” grammar rules towards rules specifying the usage of 
words on a case-by-case basis, has been followed by a great deal of studies and 
proposals for the treatment of compound and frozen expressions. Studies presented in 
[1] and [18] postulate that MWUs embody general grammatical rules and obey to 
flexibility constraints. 

The automatic extraction of multiword lexical units from specialised language 
corpora is an important issue. However, most of these units are not listed in current 
dictionaries. Multiword lexical units are compound nouns (Zimbabwean minister of 
foreign affairs, Fonds Social Europeen -the French expression for ‘European Social 
Fund’-, bacalhau a braz -a Portuguese dish-), frozen phrases (raining cats and dogs. 
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plus ou moins -the French phrase for ‘more or less’-, dando que se recebe -a Brazilian 
expression that might be translated as ‘by giving one may receive’), compound verbs 
(to take into account, mettre au point -‘to fix’ in French-, por em causa -‘to doubt’ in 
Portuguese-), prepositional locutions (as a result of, en raison de -‘because of in 
French-, a partir de -‘after’ or ‘since’ in Portuguese), adverbial locutions (from time 
to time, des que possible -the French expression for ‘as soon as possible’-, por 
exemplo -the Portuguese phrase for ‘for instances’-). It is clear that such units should 
automatically be extracted from corpora, in order to enable their rapid incorporation 
into NLP specialised lexica. Such dynamic lexical databases would enable parsers to 
be more effective and efficient. Moreover, MWUs and relevant expressions may be 
used for refining information retrieval searches [25], enhancing precision, recall and 
the naturalness of the resulting interaction with the user. 

Besides, information about the structure of MWUs should also be available in the 
NLP lexica. Indeed, one should not only find contiguous MWUs (i.e. uninterrupted 
sequences of words) but also non-contiguous MWUs (i.e. fixed sequences of words 
interrupted by one or several gaps filled in by interchangeable words that usually are 
synonyms). Non-contiguous MWUs may be exemplified by the following sequences: 

a total of where the gap may be fulfilled by nouns like cost or population, 

fournir sur (i.e. a French compound verb for ‘to give something about 

someone’) where the gaps may be filled in with sequences such as des informations 
(i.e. some informations) which have the morpho-syntactic combination Article-tNoun 

and um numero de (i.e. a Portuguese noun phrase for ‘a number of) where the 

gap may be instantiated by occurrences of Adjectives like determinado or certo 
(which would result in the English expression ‘a determined number of or ‘a certain 
number of). This kind of information, if it were available in lexica, would greatly 
help on attachment decision and as a consequence would increase the precision of 
parsers. 

The research community has adopted four distinct policies in order to retrieve 
MWUs. Some approaches only extract contiguous multiword lexical units and require 
language-dependent information such as part-of-speech tags and base their analysis 
on syntactical regularities or linguistic resources such as dictionaries ([11], [7] and 
[3]). In order to scale up the acquisition process, other language-dependent 
approaches combine shallow morpho-syntactic information with statistics in order to 
identify syntactical regularities and then select the most probable candidate sequences 
of words ([16], [21] and [12]). Some other language-dependent systems prefer to use 
in a first stage statistical techniques to calculate how correlated (associated, 
aggregated) are the words of a bigram and then apply frequency or/and correlation 
thresholds ([28] and [10]) in order to extract candidate units. The candidates are then 
pruned by using morpho-syntactic information. Finally, some purely statistical 
approaches propose language-independent techniques for the extraction of contiguous 
and non-contiguous multiword lexical units. They evidence regularities by means of 
association measure values that evaluate the mutual attraction or “glue” that stands 
between words in a sequence ([9], [29], [8], and [23]). However, the systems 
presented so far in the literature rely on ad hoc establishment of frequency or/and 
association measure thresholds that are prone to error. Indeed, thresholds pose 
important empirical problems related to their value that depends on the corpus size 
and other factors introduced by the researcher [29]. Besides, the proposed statistical 
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measures are usually not applied to generic n-gram^ (n > 2), as they are limited to 
bigrams. 

In this paper, we propose two systems based exclusively on statistical 
methodologies that retrieve from naturally occurring text, contiguous and non- 
contiguous MWUs. In order to extract the MWUs, two new association measures, the 
Symmetric Conditional Probability (SCP) and the Mutual Expectation (ME) are used 
by a new multiword lexical unit acquisition process based on the LocalMaxs 
algorithm [24]. The proposed approaches cope with two major problems evidenced by 
all previous works in the literature: the definition of ad hoc frequency and/or 
association measure thresholds used to select MWUs among word groups and the 
limited application of the association measures (considering the length of n-gram). 
The introduction of the LocalMaxs algorithm that relies on local maxima for the 
association measure (or "glue") of every «-gram, avoids the classical problem of the 
definition of global thresholds. So, our methodology does not require the definition of 
any threshold. Moreover, two normalisation processes are introduced in order to 
accommodate the MWU length factor. So, both approaches measure not only the 
"glue" within each bigram but also within every n-gram, with n>2. 

In order to extract contiguous MWUs, we used the SCP measure and the 
LocalMaxs algorithm since the SCP measure has shown to be appropriate for 
capturing contiguous compound nouns, proper names, and other compound sequences 
recognised as “natural” lexical units. For the case of the non-contiguous MWUs, we 
used the ME measure and the LocalMaxs algorithm, since this measure shows great 
ability to capture collocations and other non-contiguous multiword lexical units. In 
the next sections, we will use one of these measures depending on the kind of MWUs 
we want to extract (i.e. contiguous or non-contiguous MWUs), and we compare their 
performances with other well-known statistics that are previously normalised: the 
Specific Mutual Information [9], the ^ [17], the Dice coefficient [27] and the Log- 
Likelihood ratio [15]. 

In the second section, we present the LocalMaxs algorithm for the election of 
MWUs. In the sections 3 and 4, we expand on the SCP and ME measures and include 
the normalisation used by each measure. Using multilingual parallel corpora of 
political debate^ and Portuguese corpora, in the fifth and sixth sections, we 
respectively show the results for the contiguous and non-contiguous MWUs and 
compare both measures with the association measures mentioned above (Specific 
Mutual Information, <^, Dice coefficient and the Log-Likelihood ratio). In the seventh 
section we make the assessment of related work. Finally, in the eighth section we 
present conclusions and future work. 



^ An n-gram is a group of words in the corpus. We use the notation [w,. . .wj or rv,. . .rv„ to refer 
to the n-gram of length n. 

^ The corpus has been extracted from the European Parliament multilingual debate collection 
which has been purchased from the European Language Resources Association (ELRA) - 
http://www.icp.grenet.fr/ELRA/home.html. 
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2 The LocalMaxs Algorithm 

Most of the approaches proposed for the extraction of multiword lexical units are 
based on association measure thresholds ([9], [12], [23] and [27]). This is defined by 
the underlying concept that there exists a limit association measure that allows one to 
decide whether an n-gram is a MWU or not. But, these thresholds can only be 
justified experimentally and so are prone to error. Moreover, the thresholds may vary 
with the type, the size and the language of the document and vary obviously with the 
association measure. The LocalMaxs algorithm [24] proposes a more robust, flexible 
and fine tuned approach for the election of MWUs. 

The LocalMaxs algorithm works based on the idea that each n-gram has a kind of 
"glue" sticking the words together within the n-gram. Different n-grams usually have 
different "glues". As a matter of fact one can intuitively accept that there is a strong 
"glue" within the bigram [Margaret, Thatcher] i.e. between the words Margaret and 
Thatcher. On the other hand, one can not say that there is a strong "glue" for example 
within the bigram [if, every] or within the bigram [of, two]. So, let us suppose we have 
a function gf.£] that measures the "glue" of each n-gram. The LocalMaxs is an 
algorithm that works with a corpus as input and automatically produces MWUs from 
that corpus. 

The LocalMaxs algorithm elects the multiword lexical units from the set of all the 
cohesiveness-valued n-grams based on two assumptions. First, the association 
measures show that the more cohesive a group of words is, the higher its scored will 
be. Second, MWUs are highly associated localised groups of words. As a 
consequence, an n-gram, W, is a MWU if its association measure value, g(W), is a 
local maximum. Let’s define the set of the association measure values of all the (n-7)- 
gram contained in the n-gram W, by 42 , and the set of the association measure values 
of all (nH-i)-grams containing the n-gram W, by The LocalMaxs algorithm is 
defined as follows: 

Algorithm 1: The LocalMaxs 

Vx £ Qn-i , Vy £ f2n+i W is a MWU if 
(length (W) = 2 and g(W) > y) or 

(length (W) > 2 and x < g(W) and g(W) > y) 

So, an n-gram will be a MWU if its g(.) value under that association measure 
corresponds to a local maximum, as it is shown in Fig. 1. 

The reader will notice that, for the contiguous case, the 42, , set is reduced to the 
association measure values of the following two (n-7)-grams: [w,...w^ ,] and [w^...wj. 

^ We will write g(W) for the g(.) value of the generic n-gram W and g([w,...wj) for the g(.) 
value of the n-gram [Wj...wJ once wewant to keep g(.) as a one-argument function. We will 
instantiate this generic function by using various n-gram word association functions, namely 
the SCP(.) and the ME(.), that will be one-argument functions too. So, we can write for 
example M£(W), SCP([w,,w^,wJ), SCP([w,...wJ), etc... 

^ The entropy measure used by [23] is one of the exceptions. 
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And, the set is reduced to the association measure values of all the contiguous 
(n+7)-grams that contain the contiguous w-gram W. For the non-contiguous case, 
there are no such restrictions for the , and the All the possible combinations of 
(n-7)-grams and (n+7)-grams related with W are taken into account. 

The LocalMaxs algorithm avoids the ad hoc definition of any global association 
measure threshold and focuses on the identification of local variations of the 
association measure values. This methodology overcomes the problems of reliability 
and portability of the previously proposed approaches. Indeed, any association 
measure that shares the first assumption (i.e. the more cohesive a group of words is, 
the higher its score will be) can be tested on this algorithm. For the purpose of our 
study, we applied the LocalMaxs algorithm to the Symmetrical Conditional 
Probability, the Mutual Expectation, the Specific Mutual Information, the (^, the Dice 
coefficient and the Log-Likelihood ratio. 

One other interesting property of the LocalMaxs algorithm is the fact that it elects 
multiword lexical units on a localised basis allowing the extraction of MWUs formed 
by the juxtaposition of MWUs0 




Human Human Human Human Human 

Rights Rights in Rights in East Rights in East Rights in East 

Timor Timor are 

Figl. The “glue” values of the n-grams 



Eor example, the algorithm will elect as MWUs the n-grams Human Rights and 
Human Rights in East Timor as they are linked to local maxima, as shown in Fig. 1. 
Roughly exemplifying, the g(.) value of Human Rights is higher than the g(.) of 
Human Rights in, since in current text many unigrams could follow the bigram 
Human Rights (not only the unigram in) and many bigrams may precede the unigram 



5 



This property points at a partial solution of the problem of the overcomposition by 
juxtaposition illustrated by [12]. 
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in. The g(.) value for the 4-gram Human Rights in East is higher than the g(.) of the 3- 
gram Human Rights in, however it will not be elected as MWU because the 5-gram 
Human Rights in East Timor that contains the previous 4-gram has a higher g(.) value. 
This 5-gram will be elected since there are neither 4-grams contained in that 5-gram 
nor 6-grams containing the same 5-gram with a higher g{.) value. Although it is not 
mentioned here, the n-gram East Timor is also elected. 



3 Extracting Contiguous MWUs from Corpora 

We have used three tools [24] that work together in order to extract contiguous 
MWUs from any corpus: 

-The LocalMaxs algorithm 

-The Symmetric Conditional Probability (SCP) statistical measure 
-The Fair Dispersion Point Normalisation 



3.1 The Symmetrical Conditional Probability Measnre 

Let’s consider the bigram [x,y]. We say that the "glue" value of the bigram [x,y] 
measured by SCP(.) is: 

5CP([x, y]) = p{x I y).p(y I x) = ■ 

p{x) p(y) 

where p(x,y), p(x) and p(y) are respectively the probabilities of occurrence of the 
bigram [x,y] and the unigrams [x] and [y] in the corpus’, p(x\y) stands for the 
conditional probability of occurrence of x in the first (left) position of a bigram given 
that y appears in the second (right) position of the same bigram. Similarly p(y\x) 
stands for the probability of occurrence of y in the second (right) position of a bigram 
given that x appears in the first (left) position of the same bigram. 

3.2 The Fair Dispersion Point Normalisation 

Considering the denominator of the equation (3.1), we can think about any n-gram as 
a “pseudo-bigram” having a left part [x] and a right part [y]. The Fair Dispersion 
Point Normalisation or simply Fair Dispersion "transforms" any n-gram of any size in 
a “pseudo-bigram” and embodies all the possibilities to have two adjacent groups of 
words from the whole original n-gram. Thus, applying the Fair Dispersion Point 
Normalisation to SCP(.) in order to measure the "glue" of the n-gram [w,...wj, we 
substitute the denominator of the equation (3.1) by Avp defined in Equation (3.2): 

'■=« 1 (3.2) 

Avp = i?(Wi...w,.) 

n 1 

So, we have the normalised SCP defined in Equation (3.3): 
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5CP_/(h...wJ) 



Avp 



(3.3) 



As notation matters, in SCPJ(.), we have added "_f" for "fair" (from Fair 
Dispersion) to SCP(.). As it has shown in [24], the Fair Dispersion Point 
Normalisation concept can be applied to other statistical measures in order to obtain a 
"fair" measure of the association or "glue" of any n-gram of size longer than 2. 

4 Extracting Non-contiguous MWUs from Corpora 

We have used four tools ([24] and [13]) that work together in order to extract non- 
contiguous MWUs from any corpus: 

-The LocalMaxs algorithm 

-The Normalised Expectation measure 

-The Fair Point of Expectation 

-The Mutual Expectation (ME) statistical measure 



4.1 The Normalised Expectation Measure 

We define the normalised expectation existing between n words as the average 
expectation of the occurrence of one word in a given position knowing the occurrence 
of the other n-1 words also constrained by their positions. The basic idea of the 
normalised expectation is to evaluate the cost, in terms of cohesiveness, of the 
possible loss of one word in an n-gram. The more cohesive a word group is, that is the 
less it accepts the loss of one of its components, the higher its normalised expectation 
will be. 

The underlying concept of the normalised expectation is based on the conditional 
probability defined in Equation (4.1). The conditional probability measures the 
expectation of the occurrence of the event X=x knowing that the event Y=y stands. 
p(X=x,Y=y) is the joint discrete density function between the two random variables X, 
Y and p(Y=y) is the marginal discrete density function of the variable Y. 



p(X=x\Y = y) = 



p(X=x,Y = y) 
p(Y = y) 



(4.1) 



The Fair Point of Expectation. Naturally, an n-gram is associated to n possible 
conditional probabilities. It is clear that the conditional probability definition needs to 
be normalised in order to take into account all the conditional probabilities involved 
in an n-gram. 

Let’s take the n-gram [w, p,^ /?„ Wj ... p,. w. ... p,^ wj where p,., for i=2,...,n, 

denotes the signed distance that separates word w. from word It is convenient to 



^ This n-gram is equivalent to '^2P2s'^3 — P2i ■■■ P2n where p^. = p,, - for i= 3 ,...,n 

and Pj, denotes the signed distance that separates word w, from word w^. 
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consider an n-gram as the composition of n sub-(n-l)-gra.ms, obtained by extracting 
one word at a time from the w-gram. This can be thought as giving rise to the 
occurrence of any of the n events illustrated in Table 1 where the underline denotes 
the missing word from the n-gram. So, each event is associated to a respective 
conditional probability. One of the principal intentions of the normalisation process is 
to capture in just one measure all the n conditional probabilities. One way to do it, is 
to blueprint the general definition of the conditional probability and define an average 
event for its conditional part, that is an average event Y=y. 

Table 1. Sub-(n-7)-grams and missing words 



Sub-(n-l)-gram 



[ W, p,3 W3 .. 


■ Pa W, ... P2. wJ 


w, 


[Wj Pj, W3 ... 


Pii Wi ... Pi„ wJ 


w. 


[w, Pj, W, Pj, W,...Pj,. ,, W,,J, _ 


P1O.1, Wj 


Wi 


[Wj Pj, W, Pj3 W3 ... Pj; W, 


- Pl(„-l, ] 


w 

n 



Indeed, only the n denominators of the n conditional probabilities vary and the n 
numerators remain unchanged from one probability to another. So, in order to 
perform a sharp normalisation process, it is convenient to evaluate the gravity centre 
of the denominators thus defining an average event called the fair point of ejmectation 
(FPE). Basically, the FPE is the arithmetic mean of the n joint probabilities of the n 
(n-7)-grams contained in an w-gram. The fair point of expectation for an «-gram is 
defined in Equation (4.2). 

FP£([w,p,, W,...p, W....p,„ wj)= (4.2) 

^ n A A 

- p([w,...p, W,...p,„ wj)+ p w,...p,^ w,...p^„w„ . 



p([w^ ... P 21 W. ... for i=3,...,n, is the probability of the occurrence of the 

(n-7)-gram [w^ ... p^. w, ... wj and p W^.-.p^, W,...p,_,W_^ is the 

probability of the occurrence of one («-7)-gram containing necessarily the first word 
Wj. The corresponds to a convention frequently used in Algebra that consists in 
writing a on the top of the omitted term of a given succession indexed from 7 to n. 



7 



In the case of n = 2, the FPE is the arithmetic mean of the marginal probabilities. 
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Hence, the normalisation of the conditional prohahility is achieved by introduction 
the FPE into the general definition of the conditional probability. The symmetric 
resulting measure is called the normalised expectation and is proposed as a "fair" 
conditional probability. It is defined in Equation (4.3)[^ 



It P([w, ...P,. W ...p, Wnl) 
7 V£([wi ...Pii W; ...p,„ W J)= ^ ^ . 

. . .pii Wi . . . pin Wn Jj 



(4.3) 



p([w,...pjW....pjWj) is the probability of the «-gram [Wj...p,w,.. p,^ wj occurring 
among all the other n-grams and FPE([w,... p,. w. ... p,^ wJ) is the fair point of 
expectation defined in Equation (4.2). 



4.2 The Mutual Expectation Measure 

[23] shows that one effective criterion for multiword lexical unit identification is 
simple frequency. Erom this assumption, we deduce that between two w-grams with 
the same normalised expectation, that is with the same value measuring the possible 
loss of one word in an «-gram, the most frequent n-gram is more likely to be a 
multiword unit. So, the Mutual Expectation between n words is defined in Equation 
(4.4) based on the normalised expectation and the simple frequency. It is a weighted 
normalised expectation. 

w]) =/(k-A, A„ w]) A^£(k-A. k]) ■ 



f([Wj ... p,. w. ... Pj^ wJ) and NE([Wj ... p,. ... p^^ wJ) are respectively the absolute 

frequency of the particular «-gram [w, ... Pj. w. ... p,^ wJ and its normalised 
expectation. 

It should be stressed that, apart from representational differences related to our 
objectives (i.e. to extract contiguous and non-contiguous w-grams) that gave rise to 
two representations (one with word positional information (section 4)), there are 
important differences between (3.3) and (4.4) although the numerators are identical. 
Indeed, the denominators of (3.2) and (4.2) are different and are obtained assuming 
different smoothing strategies, due to initial research objectives. 

5 Results for Contiguous MWUs 

5.1 Comparing SCP ./with Other Statistics-Based Measures 

In order to assess the results given by the SCPJ^ measure, several measures were 
tested using the Pair Dispersion Point Normalisatiot^including: the Specific Mutual 



^ The Normalised Expectation measure is different from the Dice coefficient introduced hy 
[27] although they share the same expression for the case of bigrams. 

9 As a matter of fact, any n-gram can be divided in a left and a right part choosing any point 
between two adjacent words within the w-gram. In this way, one can measure the "glue" 
using some usual statistical measure without the Fair Dispersion, but the results are 
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Information (SI) ([9], [10] and [2]), the SCP [24], the Dice coefficient [27], the 
Loglikelihood ratio [15], and the coefficient [17]. Table 2 contains scores for these 
statistical measures working with the Fair Dispersion Point Normalisation and the 
LocalMaxs algorithm. We have used an average size corpus (919,253 words)^ 



The Evaluation Criterion. The LocalMaxs algorithm extracts w-grams, which are 
potential MWUs or relevant expressions. In order to decide if an extracted w-gram is a 
MWU or a relevant expression or not, we considered as correct ones: proper names, 
such as Yasser Arafat, Republica Centro African^^ etc. ; compound names such as 
Camara municipal de Reguengos de Monsaraz (Reguengos de Monsaraz town hall), 
convengdo dos Direitos Humanos (Human Rights convention), etc.; compound verbs 
such as levar a cabo (to get, to carry out, to implement), ter em conta (to take into 
account), etc.; frozen forms such as em todo o caso (anyway), segundo consta 
(according with what is said), etc., and other n-grams occurring relatively frequently 
and having strong "glue" among the component words of the n-gram such as tanta e 
tdo boa (so much and so good), afectadas pela guerra civil (afflicted by the civil war). 

The Results 

Table 2: Scores obtained by assigning several statistics-based association measures 



Statistics-based 
measure: g(.)= 


Precision 

(average) 


Extracted MWUs 
(count) 


SCP_f(.) 


81.00% 


24476 


SI_f(.) 


75.00% 


20906 




76.00% 


24711 


Dice_f(.) 


58.00% 


32381 


LogLike_f(.) 


53.00% 


40602 



The Precision column means the average percentage of correct MWUs obtained. It is 
not possible to calculate the exact number of MWUs in the corpus. So, we may 
measure how close to that number is the number of MWUs obtained by each 
statistical measure. As a matter of fact we are not facing the problem of counting very 
well defined objects like nouns or verbs of a corpus, but counting MWUs. So, the 



relatively poor. The enhancements obtained in Precision and Recall when the Fair 
Dispersion is introduced in several statistical measures are shown in [24]. 

^®This corpus corresponds to the news of some days in January 1994 from Lusa (the 
Portuguese News Agency). 

Note the spelling error in ‘Republica’ that should have been written as ‘Republica’. 
However real corpus is like that and we can not escape from it as there are texts that may 
reproduce parts of other texts where the graphical form of words does not correspond to 
currently accepted way of writing. 
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column Extracted MWUs, which gives the number of extracted MWUs by the 
considered measure|i^ works as an indirect measure of Recall. 

Although there are very large MWUs, for example the S-gram Presidente da 
cdmara municipal de Reguengos de Monsaraz, we have limited the MWUs produced 
by the LocalMaxs from 2-grams to 7-grams for reasons of processing time. 



Discussion of the Results. As we can see from Table 2, the measure gets the 

best Precision and a comparatively a good value for Extracted MWUs. By using the 
LocalMaxs algorithm with any of the statistics-based measures SCPJ', SI J or <^J^, a 
good Precision is obtained. However SIJ' has a relative lower score for Extracted 
MWUs (count). The DiceJ^ and specially the Loglike J measure showed not to be 
very selective. They extract many expressions (high values for MWUs (count)), but 
many of them are not relevant, they just have high frequency such as dar ao ( to give 
to the), dos outros (of the others), etc... Moreover, as it is discussed in [26], DiceJ' 
and Loglike measures do extract a lot of uninteresting units and fail to extract other 
interesting units that are selected by the other three word association measures. Thus, 
we have chosen the SCPJ^ measure to work with LocalMaxs algorithm in order to 
extract contiguous MWUs from corpora. 



5.2 Extracting Contiguous MWUs from Different Languages 

We have also tested the LocalMaxs and the SCPJ' for different languages on non- 
annotated corpora, and we have obtained the following results: 



Table 3. LocalMaxs and SCPJ scores for different languages 



Language 


Precision 


Extracted MWUs 
(count) 


Corpus 

size 


English 


77.00% 


8017 


493191 


French 


76.00% 


8980 


512031 


German 


75.00% 


5190 


454750 


Medieval 

Portuguese 


73.00% 


5451 


377724 



The MWUs of Table 3 were obtained without any morpho-syntactic operation or 
linguistic filter. Although the Precision is not the same for the different languages in 
Table 3, we think this may be due to the different corpus sizes -remember that in the 
case of the Portuguese non-annotated corpus (See Table 2), the corpus size is 919,253 
words and we have got 81% precision. Thus, we believe that for a larger corpus, 
similar precision measures may be attained for different languages. 



12 



We have discarded hapaxes, every “MWU” or “relevant expression” that occurred just once. 
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5.3 Extracting Heavily Inflected Multiword Lexical Units 

Verbs, in Portuguese and other Latin languages are heavily inflected. They vary in 
number, person, gender, mode, tense, and voice. So, in a corpus we can find phrases 
such as ele teve em conta que... (he has taken into account that...}, ele tern em conta o 
prego (he takes into account the price...), eles tinham em conta essas coisas (they took 
into account those things), isso foi tornado em conta porque (that was taken into 
account because), etc... As a consequence, due to the fact that the same verb can 
occur in different forms, it might be the case that we were not extracting every 
possible multiword verb phrase. So we needed to have every occurrence of any verb 
in a corpus lemmatised to its infinitive form. 

In order to obtain that, we used an automatically tagged corpus from which we 
produced the equivalent text by changing just the verb forms to the corresponding 
infinitive forms. Acting this way, the infinitive verb forms get relevance in the 
corpus, avoiding the dispersion by several forms. This results in higher "glue" values 
for the n-grams containing verbs and words with a strong association among them. An 
existing neural network based tagger [22] has been used for tagging a superset of the 
previous Portuguese corpu^\(i.e.. a superset of the one containing 919,253 words 
used before). The tagger disambiguates the POS tags assigned to each word. Every 
word is tagged and its base form (singular for nouns, singular masculine for 
adjectives, infinitive for verbs, etc.) is also provided by the tagger. Then, in order to 
obtain the corpus we wanted, the verbs were changed to its infinitive forms except 
those in the past participle since they are usually used as adjectives. So, except for the 
verb forms that are not in the past participle, the rest of the words were kept as they 
were in the original corpus. 

The Evaluation Criterion. In order to evaluate the results obtained by applying the 
LocalMaxs algorithm and the SCPJ^ word association measure to the transformed 
corpus, we need to remind that contiguous compound verbs may conform to a set of 
patterns. Generally these patterns have two or three words; 

-Verb+Prep+Noun (por em causa -to doubt-, levar a cabo —to get-, ter em atengdo - 
to beware to-, entrar em vigor -to come into force-, ter por objectivo -to aim-, etc.) 
-Verb+Adv+NounlAdv (ter como lema -to follow-, ir mais longe -to reach farther-, 
etc.) 

-Verb+Contraction+Noun (ter pela frente -to face-, subir ao poder -to reach the 
power-, etc.) 

-Verb+Prep+Verb (estar para chegar -to be about arriving-, etc.) 

-Verb+NounlAdv (correr bem -to get succeed-, arredar pe -to leave-, marcar passo - 
to stay-, por cobro -to put a stop to- etc.) 

-Verb+AdjIAdv (tornar possivel -to enable-, tornar publico -to divulg, tornar claro - 
to clarify-, etc. 



This corpus was made from an original corpus corresponding to the news of some 
days in January 1994 from Lusa (the Portuguese News Agency). 
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However, there are many statistically relevant expressions beginning with a verb that 
might or might not be considered contiguous compound verbs. In any case, NLP 
lexica should take them into account, since there is a strong co-occurrence between 
the verb and the rest of the n-gram. That would be the case of angariar fundos -to get 
funding, to raise funds-, criar emprego —to create jobs-, efectuar contactos -to 
contact-, fazer sentir -to convince-, viver na miseria -to live in extreme poverty-, 
aceitar sem reservas -to accept without reserves-, aplicar a pena -to apply the 
punishment-, etc... For the purpose of evaluation we have also considered these kind 
of relevant expressions as correct contiguous compound verbs. 

The Results 



Table 4. The scores for the contiguous compound verb extractions 



Eorm 


Precision 


Extracted compound 






verbsEl 


2-gram 


81.00% 


108 


3- gram 


73.00% 


492 



Discussion of the Results. Table 4 shows us respectable values for Precision. 
Once again, there is not a practical way to calculate the total number of the compound 
verbs existing in the corpus, so we can evaluate how close to that number is the 
number of Extracted compound verbs obtained by our approach (Recall). However, 
we would say that 600 (108 -i- 492) compound verbs extracted from a 1,194,206 
words corpus is a good score, but we believe that a larger corpus will enable to 
extract an amount of compound verbs which must be closer to the number of 
compound verbs of the Portuguese language. Although the high performance of the 
Neuronal Tagger (about 85% Precision for verbs and 95% Recalj^, the scores in 
Table 4 depend also on the tagger performance. Appendix A contains a sample of the 
compound verbs extracted by our approach. 

6 Results for Non-contiguous MWUs 

In this section, we first compare the results obtained by applying the LocalMaxs 
algorithm over a Portuguese corpus of political debates with approximately 300,000 
words0 with the Mutual Expectation (ME), the normalised Specific Mutual 
Information (5/_n)[2] the normalised (jj ((^_n)0 the normalised Dice coefficient 



Remind the evaluation criterion explained before. 

Precision for POS-tagging is greater than these number sugests. As a matter of fact 
there are tags that are assigned correctly 100% of the times. Verbal tags, specially the past 
participle verbal tag is rather problematic as the corresponding word most of the times 
behave as an adjective. For a deeper analysis on this subject matter see [22]. 

^^The authors are aware that the size of this corpus is relatively small. However, we 

must point at the fact that working with normalised events reduces the corpus length side 

effect factor. 

1 7 

^ ' The SI_n is the result of the normalisation process of the association ratio [9]. 
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(Dice_n^\ and the normalised Log-Likelihood ratio {Loglike, The results 
illustrate that the Mutual Expectation leads to very much improved results for the 
specific task of non-contiguous multiword lexical unit extraction as it is shown in 
Table 5. 



The Evaluation Criterion 

Table 5. Scores obtained by assigning several statistics-based word association measures 



Statistics-based 
measure: g(.)= 


Precision 

(average) 


Extraction of correct 
MWUs (count) 


ME(.) 


90.00% 


1214 


SI_n(.) 


61.00% 


276 


(|)'_n(.) 


70.00% 


294 


Dice_n(.) 


48.00% 


474 


LogLike_n(.) 


49.00% 


1044 



We first built all the contiguous and non-contiguous «-grams (for n=l to n=10) 
from the Portuguese corpus and applied to each one its respective association measure 
value and finally ran the LocalMaxs algorithm on this data set. In the case of the 
extracted non-contiguous MWUs, we analysed the results obtained for units 
containing exactly one gap leaving for further study the analysis of all the units 
containing two or more gaps. Indeed, the relevance of such units is difficult to judge 
and a case by case analysis is needed. However, the reader may retain the basic idea 
that the more gaps there exists in a non-contiguous MWU the less this unit is 
meaningful and the more it is likely to be an incorrect multiword lexical unit. Another 
important point concerning precision and recall rates has to be stressed before 
analysing the results. There is no consensus among the research community about 
how to evaluate the output of multiword lexical unit extraction systems. Indeed, the 
quality of the output strongly depends on the task being tackled. A lexicographer and 
a translator may not evaluate the same results in the same manner. A precision 
measure should surely be calculated in relation with a particular task. However, in 
order to define some “general” rule to measure the precision of the system, we 
propose the following two assumptions. Non-contiguous multiword lexical units are 

valid units if they are relevant structures such as pela vez (which could be 

translated in English by ‘for the time’) where the gap may be filled-in by 

occurrences of primeira (first), segunda (second) etc...; if they are collocations such 

as tomar decisdo (where the English equivalent would be the expression do 

take decision’) where the gap may be filled in with the articles uma (a) or tal 



The <^_n is the result of the normalisation process of the Pearson’s coefficient [17]. 

The Dice_n is the result of the normalisation process of the Dice coefficient [27]. 
90 

The Loglike_n is the result of the normalisation process of the Log-likelihood [15]. 
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(such a). Finally, a non-contiguous MWU is a valid unit if the gap corresponds to at 
least the occurrence of two different tokens in the corpus. For example, the following 

non-contiguous n-gram Premio Europeu Literatura does not satisfy our 

definition of precision as the only token that appears in the corpus at the gap position 
is the preposition de. Furthermore, the evaluation of extraction systems is usually 
performed with the well-established recall rate. Flowever, we do not present the 
"classical" recall rate in this experiment due to the lack of a reference corpus where 
all the multiword lexical units are identified. Instead, we present the number of 
correctly extracted non-contiguous MWUs. 



6.1 The Discussion of the Results 

From the results of Table 2 and Table 5, one can acknowledge that the non- 
contiguous rigid multiword units are less expressive in this sub-language than are the 
contiguous multiword units. Nevertheless, their average frequency is very similar to 
the one of the extracted contiguous multiword units showing that they do not embody 
exceptions and that they reveal interesting phenomena of the sub-language. Some 
results are given in Appendix B . 

The Mutual Expectation shows significant improvements in terms of Precision and 
Recall in relation with all the other measures. The most important drawback that we 
can express against all the measures presented by the four other authors is that they 
raise the typical problem of high frequency words as they highly depend on the 
marginal probabilities. Indeed, they underestimate the degree of cohesiveness when 
the marginal probability of one word is high. For instance, the SI_n, the Dice_n, the 

(j)_n and the Loglike_n elect the non-contiguous multiword lexical unit turcos 

curdos {Turkish Kurdish) although the probability that the conjunction e {and) 

fills in the gap is one. In fact, the following 5-gram [turcos 1 e 2 curdos] gets 
unjustifiably a lower value of cohesiveness than the 2-gram [turcos 2 curdos]. Indeed, 
the high frequency of the conjunction e underestimates the cohesiveness value of the 
5-gram. On the opposite, as the Mutual Expectation does not depend on marginal 
probabilities except for the case of 2-grams, it elects the longer MWU refugiados 
politicos turcos e curdos, correspondingly to the concordances output exemplified in 
Table 6. So, all the non-contiguous multiword lexical units extracted with the Mutual 
Expectation measure define correct units as the gaps correspond to the occurrences of 
at least two different tokens. The problem shown by the other measures is illustrated 
by low precision rates^ 



Table 6: Concordances for turcos curdos 



greve da fome sete 
greve da fome dos 
interesses dos sete 
na Grecia sete 
semanas onde sete 



refugiados politicos turcos e 
refugiados politicos turcos e 
refugiados politicos turcos e 
refugiados politicos turcos e 
refugiados politicos turcos e 



curdos 


na 


Grecia sete 


curdos 


em protesto contra 


curdos 


que 


fazem greve de 


curdos 


que 


estao detidos e 


curdos 


estao presos em 



A more detailed analysis can be found in [13]. 
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6.2 Extracting Non-contiguous MWUs from Different Languages 

A comparative study over a Portuguese, French, English and Italian parallel corpus 
has been carried out in [ 14] and has illustrated that the concept of multiword lexical 
unit embodies a great deal of cross-language regularities beyond just flexibility and 
grammatical rules, namely occurrence and length distribution consistencies. In all 
cases, the Mutual Expectation gave the most encouraging results. 



7 Assessment of Related Work 

Several approaches are currently used in order to extract automatically relevant 
expressions for NLP lexica purposes and Information Retrieval. In this section, we 
discuss some related work focusing the extraction of these expressions and some of its 
applications. 

In non-statistical approaches, syntactic patterns of occurrence that generally 
enable the retrieval of adequate compounds are searched. Generally they do not go 
beyond 5-grams. For example, [4] and [5] search for those patterns in partially parsed 
corpora (treebanks). However, Barkema recognises that the occurrence of a pattern 
does not necessarily mean that compound terms have been found. [20] also uses this 
kind of pattern matching and then generate variations by inflection or by derivation 
and check if those possible units do appear in the corpus used. More recent work can 
be founded in [6]. These approaches soon fall short of available POS-tagged corpora 
and the precision that their approaches enable are surely very low. They rely mostly 
on human selection. 

The works proposed by Barkema and Jacquemin suffer from their language 
dependency requiring a specialised linguistic analysis to identify clues that isolate 
possible candidate terms. In order to scale up the acquisition process, [12] explores a 
method in which co-occurrences of interest are defined in terms of surface syntactic 
relationships and then are filtered out by means of the likelihood ratio statistics. In a 
first round, only base-terms (i.e. terms with length two) that match a list of previously 
determined syntactical patterns (noun phrase patterns) are extracted from a tagged 
corpus. Then, as the patterns that characterise base-terms can be expressed by regular 
expressions, a finite-automata is used to compute the frequency of each candidate 
base-term. In order to do so, each base-term is classified into a pair of lemma (i.e. two 
main items) and its frequency represents the number of times the two lemmas of the 
pair appear in one of the allowed morpho-syntactic patterns. Finally, the Log- 
Likelihood ratio statistics is applied as an additional statistical filter in order to isolate 
terms among the list of candidates. However, the attained precision and recall rates 
are not presented in Daille’s work. Her approach requires a lot of morpho-syntactic 
work to extract any relevant expression, since the statistical part of it does not 
measure the correlation of the w-grams of length longer than 2. We believe that, by 
using statistics to measure the "glue" sticking together the whole n-gram, whatever 
the M-gram length is, rather than just measuring the “glue” for 2-grams, the same or 
better results could have been obtained in a more confortable way. Moreover, in 
Daille's work, the Loglike criterion is selected as the best. As Daille, we also think 
that this result is due to the fact that statistics were applied after the linguistic filters. 
Indeed, as we can infer from Tables 2 and 5, the precision attained by this criterion. 
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after correction with Fair Dispersion Point Normalisation and the Fair Point of 
Expectation is rather low. 

Linguistic approaches combined or not with statistical methods, present two major 
drawbacks. By reducing the searching space to groups of words that correspond 
uniquely to particular noun phrases structures, such systems do not deal with a great 
proportion of multiword lexical units such as compound verbs, adverbial locutions, 
prepositional locutions, conjunctive locutions and frozen forms. The study made by 
[19] shows that they represent 22.4% of the total number of the MWUs in their 
specialised corpus. Furthermore, [18] points at the fact that multiword lexical units 
may embody specific grammatical regularities and specific flexibility constraints 
across domains. As a consequence, linguistic approaches need to be tuned for each 
new domain of application. 

Smadja proposes [29] in the first part of a statistical method (XTRACT) to 
retrieve collocations by combining 2-grams whose co-occurrences are greater than a 
given threshold. In the first stage, pairwise lexical relations are retrieved using only 
statistical information. Significant 2-grams are extracted if the z-score of a word pair 
exceeds a threshold that has to be determined by the experimenter and that is 
dependent on the use of the retrieved collocations. In the second stage, multiple-word 
combinations and complex expressions are identified. For each 2-grams identified at 
the previous stage, XTRACT examines all instances of appearance of the two words 
and analyses the distributions of words and parts of speech tags in the surrounding 
positions. If the probability of the occurrence of a word or a part of speech tag around 
the 2-gram being analysed is superior to a given threshold, then the word or the part 
of speech is kept to form an n-gram. Although, Smadja’ s methodology is more 
flexible than the studies previously exposed it relies on ad hoc establishment of 
association measure thresholds that are prone to error and on association measures 
only defined for bigrams. 



8 Conclusions and Future Work 

By conjugating the LocalMaxs algorithm with n-gram-length normalised association 
measures, it is possible to extract automatically from raw texts relevant multiword 
lexical units of any size and thus populate lexica for applications in NLP and 
Information Retrieval, Information Extraction and Text Mining. 

The different natures and structures between contiguous and non-contiguous 
multiword lexical units has lead us to elaborate two distinct methodologies in order to 
retrieve each particular kind of units. The studies and experiments showed that two 
skilled association measures (namely in their smoothing -normalisation- techniques) 
were needed: the SCP with the Fair Dispersion Point Normalisation for contiguous 
MWUs and the ME with the Fair Point of Expectation for non-contiguous MWUs. 

The results obtained by comparing both the SCP and the ME with the Specific 
Mutual Information [9], the (j^ [17], the Dice coefficient [27] and the Log-Likelihood 
ratio [15] allow us to point at the fact that these new introduced measures are 
specifically designed for the extraction of MWUs within the concept of local maxima 
embodied by the LocalMaxs algorithm. However, we believe that there are neither 
universal association measures nor absolute selection processes. Indeed, the 
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sensibility we have acquired also makes us believe that for different applications one 
may consider different methodologies. 

The statistical nature of our approaches confirms that the LocalMaxs algorithm is 
a more robust approach reducing the commitments associated to the complexity of the 
language. Besides, the local maximum criterion avoids the definition of frequency 
and/or association measure global thresholds. We had not the time to compare the two 
smoothing strategies proposed in this paper. So, as future work, we intend to improve 
our methodologies by cross-studying the results of each association measures and 
smoothing strategies in order to find the most fruitful combinations. 



Appendix A: A Sample of Contiguous Compound Verbs Extraction 



Ter em atengdo ( to beware to) 
ter em conta (to have in mind) 
tomar uma decisdo (to take a decision) 
travar de razdes ( to discuss) 
usar aforga (to use the police force) 
atingir o limite ( to attain the limit) 
cantar as janeiras (to sing the janeiras) 
causar a morte (to cause the death) 
estar para durar ( to be to last) 
entrar em vigor (to come into force) 



Levar a cabo ( to get) 

Levar a efeito ( to get) 

For em causa ( to doubt) 

Dar explicagdes ( to explain) 

Dar prioridade (to give priority) 
Marcar passo ( to stay) 

For termo ( to make an end of) 

Tirar conclusdes (to conclude) 

Tomar publico (to divulge) 

Trocar impressdes (to exchange ideas) 



Appendix B: A Sample of Non-contiguous MWUs Extraction 

Greve fome ( hunger strike]^ 

Frogressos registados ( registered improvements) 

Tomar decisdo ( to take decision) 

Distorgdes concorrencia (concurrency distortions: distortion of the concurrency) 

Fresenga observadores (observers presence; presence of the observers) 

Taxas IVA (VAT rates, rates of the VAT) 

Um numero de (a number of) 

Uma lista projectos (projects list; list of the projects) 

Transporte de perigosas ( transport of dangerous ) 

Froposta de do Conselho (proposal of of the Council) 
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There is no existing gap in the English translation as the unit is the result of two 
different ways of writing the same concept: greve da fome and greve defame. 
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Abstract. The distinction between literal and hgurative language (me- 
tonymies, metaphors, etc.) is often not made formally explicit, or, if 
formal criteria exist, insufficient. This poses problems for an adequate 
computational treatment of these phenomena. The basic criterion for 
delineating literal from figurative speech we propose is centered around 
the notion of categorization conflicts that follow from the context of the 
utterance. In addition, we consider the problem of granularity, which is 
posed by the dependence of our approach on the underlying ontology. 

1 Introduction 

Figurative speech comes in diDerent varieties (e.g., the metonymy in example 
(|2j and the metaphor in example ([3|) below), and is typically contrasted with 
literal language use (e.g., example ©) on the basis of some notion of deviance. 

(1) WThe man left without paying.U 

(2) UThe ham sandwich left without paying.U 

(3) UThe Internet is a gold mine.U 

Currently, two approaches prevail, which spell out this distinction. The Drst 
one m simply regards deviation from literal reference as a suD dent condition 
for Dgurativeness. This does not explain, however, what the formal criteria for 
deviation actually are. Thus, the discrimination of literal and Dgurative meaning 
rests on subjective ascription. 

The second approach |.^I27I24| introduces such a formal criterion, one which 
depends on the notion of Dviolation of normsD, selectional restrictions of verbs 
and nouns, in particular. Each time these are violated, e.g., through type con- 
Dicts, an instance of Dgurative speech is encountered. As a consequence, special 
reasoning patterns, such as type coercion (for metonymies |24|) or analogy-based 
structure mapping (for metaphors |2I7| ). are activated in order to cope with the 
triggering instance such that a reasonable interpretation can be derived that 
does no longer violate the underlying constraints. The proponents of this ap- 
proach present a lot of supporting evidence for their methodological claims (cf., 
e.g., example Q) but obviously fail to cover a wide range of residual phenom- 
ena (as can be seen from the example (0 that lacks any violation though being 
Dgurative without doubt): 

P. Baralioiia and J.J. Alferes (Eds.): EPIAE&9, LNAI 1695, pp. 1331 11471 1999. 

(c) Springer- Verlag Berlin Heidelberg 1999 
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(4) D/ read Chaucer.W 

(5) D/ like Chaucer.W 

In this paper, we subscribe to the derivational paradigm, i.e., the requirement 
to generate Dgurative speech from speciDcations of literal language use. Thus, we 
also regard Dgurative meaning as a deviation from literal meaning, but will not 
be content to leave the description at that. Rather than formalizing the notion of 
deviation with recurrence to selectional restrictions we will base our distinction 
on conceptual criteria that also incorporate the iiiDuence from the context of 
an utterance. In addition, we share the assumption that the distinction between 
literal and Dgurative speech derives from oneH conceptualization and is therefore 
subjective, but we will provide a formalization of this dependency on ontology. 
We will also identify at least one case of ontological dependency (the problem of 
granularity) where subjectiveness can be overcome by taking additional formal 
criteria into account. 

2 Lexical Meaning 

We will base our considerations on the notion of context-independent lexical 
meaning of lexemes, from which the notions of literal and Dgurative meaning 
in context will be derived. Lexical meaning will be a function from lexemes to 
categories (concepts) of an ontology. 

So, let C be the set of lexemes of a given natural language and let C W £ be 
the subset of lexemes containing nouns, full verbs and adjectives only (e.g., man 
or policeman are elements of C). We also assume an ontology composed of a set 
of concept types T = ^pviAN, Policeman, Sandwich, w>o0> a set of instances 
X = ^an-1, policeman-2, w>i> <0> and a set of relations TZ = %has-part, part- 
of, agent, i>i>t> <0 (we take a settheoretical semantics for granted as is commonly 
assumed in description logics m)- The lexical meaning can then be deDned 
as a relation D £' D D 

While we refrain from considering the linkage between lexemes and ontolog- 
ical entities in depth (cf., e.g., [1] or [I^!), we require the relation Wiex to fnlDll 
the following properties: 

1. If lexeme D C is a proper name, then a unique lexeme.! D X" D X with 
{lexeme, lexeme.!) D D/e^ exists such that lexeme.! D X. Thus, every 
proper name is linked to a single instance in the domain knowledge base. 

2. If lexeme D C is not a proper name, then we require a concept lexeme.CON 
D X to exist such that {lexeme^ lexeme.CON) D Djea,. In addition, no instance 
lexeme.! D X exists such that {lexeme, lexeme.!) D Djea,. 

3. For reasons of simplicity, we will now restrict appropriately. If lexerne 
D C is not a proper name, we require for all ! D Wif,x{lexeme) that ! can 
be referred to by lexeme in a context-independent way. So, we assume that 
reference to any ! via lexeme is always possible. (We cannot, e.g., relate the 
lexeme fool to Man as not every man can be referenced by fool independent 
of the context.) The condition of context-independence may, however, still 
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hold for several concepts that stand in a subsumption relation to each other. 
When we regard the lexeme man, this condition holds for both the concepts 
Man and Policeman, as all i D Policeman and all i D man can be 
referenced by man. We then regard the most general concept (here, Man) as 
the lexical meaning, and, in general, consider lexical meaning as a function|i] 
Wisx '■ C W TWT. By convention, we denote Wiexilexeme) by lexeme.CON. 

Lexical meaning is thus considered as a context-independent function from 
lexemes to categories (concepts) of an ontology. As there is no agreement on 
canonical ontologies, this mapping introduces subjective conceptualizations. 

Finally, we extend our deDnition to words la of a discourse so that their 
corresponding lexeme be ivdex D C . We simply assume Wiex{w) := Uiex{wdex). 
We distinguish the range of that mapping by w . i for proper names and ra.CON 
in all other cases. Hence, the lexical meaning of the word DmanD in example (H] 
is given by Man.0 

3 Literal vs. Figurative Meaning 

While in the previous section we have been dealing with the isolated lexical 
meaning of a word only, we will now incorporate the context of an utterance in 
which a word appears. Hence (cf. Fig. H], we here introduce the word w' with 
respect to which word w is syntactically related D w' is either head or inodiDer 
of w. Such a dependency relation (either a direct one or a well-deDned series 
of dependency relations) at the linguistic level induces a corresponding concep- 
tual relation i’ D 72. at the ontological level [E5]- The conceptual relation r links 
the conceptual correlates, w.sf and w'.sf, of w and w' , respectively. Accord- 
ingly, we may now say that w StandsFor a corresponding domain entity w.sf; 
alternatively, w.sf is called the (intended) meaning of w. The comparison of 
w.sf with la.CON or w.i lies at the heart of the decision criterion we propose 
for judging whether a reading is literal or Dgurative. So, in the well-known ex- 
ample (E), Uham sandwichU (= w) StandsFor Dthe man who ordered the ham 
sandwichD (— w.sf), which is distinct from its lexical meaning, Ham-Sandwich 
(= re. con). 

We may now consider some examples to distinguish several cases how w . sf 
or w.CONgf can be related to w.CON or w. i. This will also lead us to clarify the 
notion of distinctiveness between the items involved. Let w.sf be an instance 
fromX, and let w.CONsf be the least general concept such that w. sf D ui.CONsj^.|] 
This assumption will be shortcut as w.sf instance- o f w.CO^isf ■ 

^ In order to make fiiex a function we assume in the case of polysemy one of several 
meaning alternatives to be the primary one from which the others can be derived. 
In the case of homonymy, we assume the existence of different lexemes which can be 
mapped directly to mutually exclusive concepts. 

^ The lexical meaning of a word w must be distinguished from the concrete referent 
of w in the given discourse. 

^ The least general concept ui.CONs/ with w.sf G w.CONsf is the intersection of all 
concepts C £ F with w.sf € C. 
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Fig. 1. Framework for Contextual Interpretation 
In the most simple case, w. sf and w.CON / w. i are related by an instance-of 
relation. Then w.CONsf = re. CON holds. In the utterance 

(6) UA man left without paying. U 



we have w = \lmarPi and w' = WlefS\. Furthermore, w.sf = man-1 instance-of 
Man = w.CON = w.CONgf. In the example ([HI), we note that lexical meaning 
and actual meaning coincide. 

If we consider all relations other than equality as deviant, we characterize 
a class of phenomena that is certainly larger than the one containing Dgurative 
speech only. Example (2D 



(7) policeman left without paying. The man lost his job.U 

illustrates an anaphoric relation between Uthe mar£\ and Da policemarB. With 
this, a subsumption relation holds between w.CONsf (= Policeman) and m.CON 
(= Man), which means that tc.CON is either more general or equal with w.CONsf. 
In particular, we have (policeman-2 =) w. sf D in. CON, but not w. sf instance-of 
w.CON. in general (as in example ®). 

Loosening ties a bit more, we may abandon the subsumption relation between 
w.CONsf and w.CON as in example m- 

(8) Dd. policeman left without paying. The fool lost his job.U 

We have (policeman-2 =) w.sf D w.CON (= Fool), but the specialization 
relation between w.CONsf (= Policeman) and w.CON (= Fool) no longer 
holds. Instead, we are set back to w.sf D w.CONsf D w.CON and, therefore, 
w.CONsf D w.CON t D. We say that w.CONsf and w.CON are compatible, as no 
categorization conDict arises. This also holds for the previous examples in this 
section. So, the notion of categorization conU ict turns out to become crucial for 
our distinction between literalness and Dgurativeness D the latter being based on 
an underlying categorization conDict, whereas the former is not. We summarize 
these observations in the following deDnition: 



DeDnition 1 (Literalness via Syntactic Constraints). 

A word w in an utterance U is used according to its literal meaning, if for 
every instance w . sf D T which w StandsFor, one of the following two conditions 
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hold: 

wcsf = woi if w is a proper name (1) 

wcsf D ziJtCON else (2) 

Especially, w.CONsf D w.CON t D holds for non-proper nouns. 

We here restrict the notion of Dgurative speech to those relationships between 
w . sf and the lexical meaning of w in terms of inccON, which are not inclusive 
ones. A literal use of the word w for an instance w.sf instance- o f w. CON is 
only possible, if w.CONgf D icccON t D. If, however, a categorization conDict 
occurs, i.e., w.CONs/ D wtCON — D holds, we call the use of w Wgurative. Such a 
Dgurative use is illustrated by the word Wham sandwichW in example dU or Wgold 
mineW in Two consequences of this deDnition deserve special mention: 

1. We can determine exactly the place where subjectivity comes in when a 
distinction between literalness and Dgurativeness is made D it is mirrored by 
subjectivity in categorization. WfoolU in example (|H)) can only be considered 
as literal, if the concepts Fool and Policeman are considered as being 
compatible (in the settheoretic sense introduced above). If one does not share 
this conceptualization, this usage of UfoolW must be considered as Dgurative 
(or even absurd). Thus, we capture the subjectivity of Dgurativeness formally 
in the ontological premises, not via intuitive considerations. 

2. It is also important to note that DeDnition[T]does not depend on the violation 
of selectional restrictions. The example lO) (D/ like Chaucer.W) allows for 
the same analysis as example (H|) (D / read Chaucer.W), because the intended 
patient of like are, in both cases, Writings-by-Chaucer (= w.sf), although 
this is not indicated by selectional restrictions at all. In both cases w.CONgf 
D WCCON = D, i.e., Dgurativeness holds. 

4 Granularity 

The (non-)inclusion criterion we have set up for the distinction between literal 
and Dgurative usage of words in DeDnition [T] introduces a strong tie to the un- 
derlying ontology. The problem this might cause lies in granularity phenomena 
of domain knowledge bases and in the general question whether every diDerence 
in conceptnalization induces diDerent literal D Dgurative distinctions. Given dif- 
ferent levels of granularity, it may well happen that a word w StandsFor an 
instance w.sf instance- o f w. CON with w.CON^f D inccON = D, though, intu- 
itively, one would rate the usage of in as a literal one. Let us illustrate this case 
with two examples. Assume we have a knowledge base KB\ in which Cpu hap- 
pens to be PART-OF the Motherboard, while Motherboard itself turns out 
to be PART-OF Computer. If we analyze the example 

(9) WThe CPU of the computer . . . D 

accordingly, we end up with the determination of a Dgurative usage for Dcom- 
puterW, since Motherboard D Computer = D (cf. Fig. (2]). 
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Fig. 2. Example Q Assuming KB\ 



If we assume, however, an ontological representation in a domain knowledge 
base KB 2 such that Cpu is an immediate part-of the Computer, then we 
derive a literal usage for w (cf. Fig. 0. 
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Fig. 3. Example @ Assuming KB2 



To get rid of the dependence on knowledge base granularity, to a certain 
degree at least, we may derive a weaker condition of literalness from DeDnition 
[T] To achieve this, we state that w.sf and w' . sf are related by a conceptual 
relation r (technically, w' . sf r w.sf). Thus, for literal usage of w we require: 

w^csf r w>i if w is a proper name (3) 

Di D rctCON: w'csf r i else (4) 



(EJ immediately follows from JI]) in DeDnition [U since w' . sf ?’ w. sf (= w. i) 
holds. II can be deduced from (EJ by deDning i := w. sf . 

Since these conditions provide weaker conditions of literal language use than 
the ones we have agreed upon in DeDnition [T| all literal usages determined by 
the strong condition still remain literal (in particular, example {3) is considered 
a literal usage of UcomputerU given KB 2 ). Considering the granularity eDects for 
example with respect to KBi, however, we may determine the literal usage 
of UcomputerQ by the following consideration. As we know that Cpu is part- 
OF Motherboard, and Motherboard is part-of Computer, we conclude 
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with the transitivity of the part-of relatioiQ that Cpu is part-of Computer. 
Hence, criterion ED is fulDlled. In contradistinction to the examples discussed 
previously, we do not have w. sf D zc.CON (cf. criterion |2D from DeDnition[TD, as 
w. sf = motherboard-2 holds. So by moving from the strict criteria in DeDnition 
[U to the weaker ones stated by criteria © and @ we are able to incorporate 
granularity phenomena of knowledge bases. Note however that we cannot aban- 
don completely our dependence on (the encodings of) the knowledge layer, since 
we exploit knowledge about the transitivity of relations to solve the problem. 



5 Figurativeness and Reference 

One might argue that the problem just discussed, the dependence of the distinc- 
tion between literal and Dgurative usage on knowledge base structures, follows 
from the deDnition of StandsFor. Accordingly, some researchers |19| have pro- 
posed to build the deDnition of Dgurative speech upon the notion of reference. 
The assumption being made is that w uniquely refers to a knowledge base item 
w . ref instance-of w.CONref and that Dgurativeness results from the deviation 
of this reference from literal meaning. Although their notion of deviance is not 
formalized, referentially-based literalness can now be deDned straightforwardly 
in our approach by proceeding exactly along the lines of DeDnition ITI 

DeDnition 2 (Literalness in the Referential Approach). 

A word w is called literal in the referential approach, if: 

wcaref = w>i if w is a proper name (5) 

wcref D uiccON else (6) 

Without doubt, we here circumvent the granularity problem, since no chanre 
in reference occurs for example (|9]), no matter whether KB\ or KB 2 is assumed[j 
But the reference approach runs into severe problems when one considers, e.g., 
classical examples of metonymies such as 

(10) D/ like to read Chaucer. He was a great writer. W 

We have w = D ChaucerQ as a typical example for a writer- for- writings metonymy 
m. The assumption to link literal/ Dgurative usage to reference relations is 
Dawed by the fact that w = UChaucerU does not refer to the Dworks of ChaucerD, 
because in this case the referentially determined anaphor UHeU could not be re- 
solved. In particular, we have Chaucer .ref = Chaucer, therefore w.ref = w. i. 
Hence, UChaucerU must be considered, counterintuitively, as a literal use accord- 
ing to criterion d5|) (similar problems have been discussed at length by Stallard 

^ We are aware of empirical observations about the transitivity of part-whole relations 
mu , in particular those pertaining to the claim that any of the sriferelations of part- 
whole are transitive, while the general part-whole relation usually is not. When we talk 
about the transitivity of part-whole, we mean this constrained type of transitivity. 

® Note that this definition is, nevertheless, still dependent on the knowledge base and 
on the lexical meaning of w. 
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12711 ). Given our context-dependent deDnitions (H]or|3l, we get w . sf = Writings- 
by-Chaucer so that w.sf Chaucer. Thus, w = UChaucer{\ is analyzed Dgnra- 
tively in our approach^ 

Summarizing, we combine criteria © to ® by the following conditions: 

1. A word w is nsed in its literal meaning in all deDnitions, if w.sf D re. CON 
(analogously, w.sf = w . i) for all w.sf, and w . ref D re. CON hold (combining 
the referential DeDnition[^ and DeDnition[l] with respect to literal usage). 

2. If w.sf D w.CON (analogously, w.sf t w.i) for some w.sf (with respect to 
a relation r and another word w'), bnt w.ref D wtcon (w.ref = w.i), two 
cases mnst be distinguished: 

D In cases of granularity eDects criterion @ holds. By this, an i D ic.CON 
exists with w' . sf r i (analogously, w' . sf r v ,±). We can include this in 
our deDnition of literal usage as its analysis is only dne to implications 
a particular ontology design brings to bear. 

D In cases of Dgurative speech like the one in example (fTHl) the criteria 
SD / m do not hold. We may include these cases into our deDnition of 
Dgurative usage: 

3. A word w is used in its D gurative meaning according to the syntactic and 
the referential deDnition, if w.ref D w.CON holds and there exists a w.sf D 
zc.CON. This is the case, e.g., in example d2j. 

So far, we have only considered the Dgurative usage of a word w. We may 
end up by deDning t/ as a U gurative utterance, if it contains at least one word w 
which is used in a Dgurative way. 

6 Putting Theory to Practice — Metonymy Resolntion 

So far, we considered the Dgurativeness fiom an entirely theoretical perspective. 
Our interest in this topic can be traced back, however, to practical requirements 
which emerge from SynDiKATe, a text understanding system for processing 
product reviews from the IT domain as well as medical Dnding reports m- 
The need to deal with a particular type of Dgurative language, metonymic 
expressions, becomes evident when one considers the quantitative distribution 
of referential text phenomena we determined for a random sample of 26 texts. 
In 606 utterances we encountered 103 metonymies together with 291 nominal 
anaphora and 351 bridging anaphora |20) . With a metonymic expression encoun- 
tered in almost every sixth utterance, an uncontroversial need for dealing with 
this problem is demonstrated. We also collected empirical evidence that a 16% 
increase in the accuracy of the anaphora resolution procedure can be achieved 
by the incorporation of a metonymy resolution process (due to the overlapping 
of metonymies and nominal anaphora). 

SynDiKATe builds on a fully lexicalized dependency grammar El- Parse 
trees link lexical nodes via a well-deDned set of dependency relations (cf. genitive- 
attribute as illustrated in Figures[2]and[3l). For semantic interpretation [25|, the 



A methodology for intersentential anaphora resolution, given these interpretations, 
is described in [28]. 
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conceptual correlates of the lexical items in the dependency graph need to be 
linked via conceptual relations. This linkage is based on conceptual constraints. 
One major source of these constraints are dependency relations, since they are 
often characterized by a hard-wired mapping onto general conceptual relations. 
For instance, the dependency relations subject or direct-object, by convention, 
map to the conceptual relations agent and patient or patient, respectively. There 
are types of dependency relations, e.g., genitive-attribute, which have no associ- 
ated conceptual constraints. In such cases, a fairly unconstrained search of the 
knowledge base for proper linkage is performed resulting, e.g., in the determina- 
tion of the part- o/ relation (chain) between Cpu-1 and Computer-2 in Fig.0for 
the lexical correspondents UCpuU and UcomputerU linked via genitive-attribute. 

We compute possible StandsFor relationships relevant for the distinction be- 
tween literal and Dgurative language by looking at the paths found as a result 
of the search of the knowledge base for conceptual relations. Especially, we look 
at the intermediate concept nodes a path contains in order to determine the 
compatibility between those concept and the lexical meaning. For illustration 
purposes, a few examples of paths will be discussed subsequently. When the 
diDerent interpretations are computed and distinguished, these distinctions can 
then be used for disambiguation and preferential ranking^ 

Consider the following two examples already introduced in this paper. 

(11) WThe ham sandwich left (without paying). U 

(12) W(I) like Chaucer.W 

In example dill) we concentrate on the subject dependency relation between 
\}ham sandwichU and UleftU. As already remarked, subject maps conceptually 
to agent (for active voice) or patient (for passive voice) roles. Concentrating 
on the active voice case, the concept Leave has associated a conceptual role, 
leave-agent, a subrole of agent, which has a range constraint on Persons. A 
Person may order a Product, a possible product being eatables such as 
Ham-Sandwichcs. a possible linkage between the conceptual correlates oiUham 
sandwichD and UleflB is, therefore, given by the following path: (Leave D 

leave-agent D Person D orders D Ham-Sandwich). 

In example m we concentrate on the patient relation between WlikeW and 
WChaucedi] as induced by the direct-object relation. In this case, the concept Like 
is associated with the conceptual role, like-patient, a subrole of patient, which has 
no range constraint other than the most general concept AnyThing. AnyThing 
subsumes Persons, which Chaucer is an instance of, so that the path = 
(Like D like-patient D Chaucer) just expresses that Chaucer as a person is liked. 
In addition. Anything also subsumes Books, which are written-by a particular 
Author, one of those being Chaucer. In this case, we have constructed another 
path: (Like D like-patient D Book D written-by D Chaucer). 

So when possible paths between w.CON and rcDcON have been determined the 
proposed Dgurativeness criterion can be used to determine whether the readings 

^ Of course, we restrict the search through the knowledge base, e.g., by conceptual 
connectivity criteria (cf. |20| for details). 
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that can be derived from the path are Dgurative or literal. Let us look at our 
example path p([\\= (Leave D leave-agent D Person D orders D Ham-Sandwich) 
again. Possibilities for the StandsF or items left . sf and ham-sandwich, sf and 
the conceptual relation r corresponding to the syntactic subject relation are 

D left.sf D Leave (= Left. con), r — leave-agent and sandwich, sf D Per- 
son (with Person D Ham-Sandwich = D). 

D left . sf D Person (with Person D Leave = D), r = orders and sandwich, sf 
D Sandwich (= Sandwich. con). 

The Drst possibility gives rise to a literal reading of left and a Dgurative 
reading of ham sandwich when compared to our criterion whereas the second 
gives rise to a Dgurative reading of left and a literal reading of ham sandwich. 

In the Chaucer example possibilities for the StandsFor items like.sf and 
chancer . sf and the conceptual relation r corresponding to the syntactic direct- 
object relation include 

D like.sf D Like (= like. con), r = like-patient and chaucer.sf = Chaucer 
(= Chaucer, i). This is the only reading that can arise by taking the path 
p^^into account as there are only two concepts and one relation in the path. 
The next two readings arise when looking at pk^ 

D like.sf D Like (= like. con), r = like-patient and chaucer.sf D Book 
(with Book and Chaucer, i being incompatible). 

D like . sf D Book (with Book D Like = D), r = written-by and chaucer . sf 
= Chaucer (= Chaucer, i). 

The Drst possibility accounts for a literal reading of Chaucer and like, whereas 
the second and third account for a Dgurative reading for Chaucer or like. 

It is important to note that the disambiguation of diDerent readings may 
proceed along several lines such that this process is not aDected by the crite- 
rion of Dgurativeness itself. The determination of possible readings can, e.g., be 
combined with all sorts of disambiguation criteria or search heuristics such as 

1. Preference of literal readings over Dgurative ones, thus preferring the Drst 
reading in the Chaucer example over the other two readings. One can show 
that an approach based on this criterion only amounts to an approach consid- 
ering selectional restriction violations as necessary for establishing Dgurative 
readings. 

2. Additional world knowledge that determines that a literal reading in the 
Chaucer example is invalid when uttered in 1999. 

3. Knowledge about anaphora and discourse relations might be helpful for the 
Ham-Sandwich example, e.g., such that a record of discourse entities is 
kept from which one may derive that a person has ordered a sandwich who 
has already been introduced into the discourse. So that person is likely to be 
referred to again, thus making the Dgurative reading of ham sandwich more 
likely than the Dgurative reading of left. 



Formal Distinction between Literal and Figurative Language 



143 



4. Statistical knowledge about linguistic structures that make it more likely 
that a noun phrase be used metonymically than a verb phrase, thus preferring 
the second reading over the third reading in the Chaucer example. 

All these criteria can be combined with our Dgurativeness criterion so that 
the criterion can also provide a testbed for preference heuristics and search re- 
strictions when analyzing Dgurative language. In the framework of SynDiKATe, 
we have, Drst, developed path patterns inferred from the Dgurativeness criterion 
that make the distinction between Dgurativeness and literalness based on paths 
between knowledge base concepts easier to determine. A simple example would 
be tha^aths of length 1 always mirror literal readings of both lexical items in- 
volvedlfl We then developed various search and disambiguation heuristics which 
take into account the following features (for details, cf. j2())h 

D Common (or, typical) Dgurative relations are preferred over not so common 
ones, thus preferring paths including patterned relations. 

D Discourse criteria preferring readings allowing for anaphoric readings over 
ones that do not. 

D Only when those criteria do not apply, we prefer literal readings over Dgu- 
rative ones. 

It is important to note that these kinds of comparisons can only be achieved 
when a well-founded distinction of Dgurative and literal language is possible, 
which should not rely on the heuristics themselves, e.g., restricting Dgurative 
speech to cases where a sortal couDict occurs. Instead, an independent ontological 
criterion is necessary that can be combined freely with reasonable heuristics. 



7 Related Work 

We consider as the main contribution of this paper the introduction of a formal 
notion of deviance that is general and simple. To the best of our knowledge, no 
comparable work has been done so far on this issue. Although there exist formal 
characterizations of metaphors in the framework of analogical reasoning, 

these are entirely self-contained, i.e., they account for structural properties of 
metaphors (e.g., constraints on domain mappings, aptness conditions), rather 
than dealing with the distinction between literal and Dgurative speech. Even 
more serious from a natural language understanding point of view, concrete 
metaphorical utterances cannot be translated into the formal systems provided 
for metaphor explanation due to the lack of concrete mappings between the 
lexical and conceptual level 1131 . 

The state of the art aiming at a clariDcation of what constitutes, e.g., a 
metonymy is characterized by a quote from LakoD and Johnson 1191 . They Dnd 
that it is characterized by the use of Done entity to refer to another that is related 
to itD. It is not clear at all what kind of entities they are talking about. One must 

® There are other literal paths as well which account for granularity problems. 
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assume that they are referring to some kind of signs because it is not obvious 
how other entities can refer to something. But even if we restrict the entities 
to linguistic signs there is no restriction on the kind of relatedness between the 
objects. For example, relatedness might include class inclusion, similarity or part- 
whole relations, but only the latter are included in metonymy in general and the 
examples LakoD and Johnson put forward suggest that it is this conventional 
kind of metonymy they are talking about. The same shadowy deDnitions of 
Dgurative language are then often adopted by theoretical linguists I29I17I30I23I . 
as well as computational linguists E32I]. This leads to the fact that it is mostly 
not clear at all, which phenomena are treated by these approaches and how they 
discriminate diDerent varieties of Dgurative speech from others. 

In addition, a tendency can be observed in more formal approaches D pressed 
by the need to Dnd computationally feasible deDnitions of metaphor or metonymy 
D to consider Dgurative language a violation of selectional restrictions |2I5I12I24] 
or communicative norms |9I26) . Such an approach equates an often used trig- 
gering condition, viz. constraint violation, with the phenomenon of Dgurative 
language (or, subsets, like metonymies). So, it confuses the possible, but not 
necessary eDects of a phenomenon with the phenomenon to be explained. 

Despite the lack of formal rigor in previous work, it is worth to investigate 
how our formal criterion is compatible with other views on Dgurative speech from 
cognitive linguistics in general. The tendency to see Dgurative speech rooted in 
conceptual categories, as we do, is becoming consensus. The main trend is, e.g., 
to treat metaphors as a means of categorization by way of similarity [H] and, 
in principle, to retrace Dgurative speech to cognitive procedures involving cat- 
egorization and (subjective) experience |1 9IHI1 8j . So, LakoD and Johnson see 
metaphors rooted in our way of conceptualization via mappings. Kittay 
and Turner m regard some kind of conceptual incompatibilities as the basis of 
metaphorization. Nevertheless, they do not explicate their theory of categoriza- 
tion and incompatibility nor do they recognize that these incompatibilities are 
relevant for other kinds of Dgurative speech, as well as for metaphors in the strict 
sense of the word. The dependence of lexical, literal and Dgurative meaning on 
ontologies is, therefore, realized, but no explicit formal treatment is given of par- 
ticular problems resulting from the ontological presuppositions. We here regard 
the problem of granularity arising form ontological dependence and subjective 
experience and propose a solution by relaxing criteria for literal meaning. 

The second major contribution of the paper derives from the formal status 
of our distinction between literal and Dgurative meaning. Once such a formal 
notion is given, it allows us to characterize subjectivity, so far an entirely in- 
formal notion, by reference to the particular ontology underlying the language 
understanding process. The current perspective on the ontological basis is rather 
static, viz. diDerent ontologies account for cases in which one person considers 
an utterance to be literal, while another considers it to be Dgurative. Our pro- 
posal introduces Dexibility into such an assessment. We aim at adapting diDerent 
ontologies such that by way of abstracting away diDerent granularities of repre- 
sentation structures (e.g., by generalizing more Dne-grained representations to 
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a coarser grain size, as in criterion (|4}) disagreement might turn into consensus 
(e.g., considering example (0). Contrary to that, the majority of researchers in 
our Deld of study attribute the diDerence in opinion to the existence of diOer- 
ent, incompatible ontologies, and leave it with that explanation without further 
attempt at smoothing mm- 

The third major proposal we make relates to the contextual embedding of 
Dgurative speech. The criterion we formulate is based on syntactic relations only 
that guide conceptual interpretation. In particular, and unlike most algorithmic 
accounts ll22i5H2J24l . it does not rely at all upon the violation of selectional re- 
strictions (for a notable exception, cf. m), since this criterion accounts only for 
a subset of the phenomena naturally recognized as Dgurative language. In addi- 
tion, the syntax-based proposal we make avoids to consider reference changes as 
an indicator of Dgurativeness as is commonly assumed M- It is inspired by Fau- 
connierli m DconnectorD function. Though this proposal aims at an embedding 
of Dgurative language into syntax, there exists no formalization of this notion 
in relation to an established grammar framework as we do nor is the notion of 
conceptual incompatibility formalized. A more formal criticism of the reference 
changes proposal was made by Stallard m who, nevertheless, then only dealt 
with Dgurative language, violating sortal constraints. 

Our approach is also compatible with viewing Dgurative language as regular 
and not violating linguistic norms. Whereas literal language is grounded in in- 
clusion relations to lexical or context-independent meaning, Dgurative language 
is grounded in other relations to lexical meaning. These can, nonetheless, be 
systematic and conventionalized relations like part-whole relations or obvious 
similarities. Although we have not spelled out these relations, there is no need 
to claim that inclusion relations are prior or preferred to other relations. Lit- 
eral as well as Dgurative speech are both grounded in structured relations in 
categorization. This is in accordance with the conventional metaphor view Drst 
stipulated by LakoD and Johnson. It is also in accordance with psycholinguistic 
research showing that Dgurative speech is in most cases as easily understood as 
literal speech. This is especially the case if the instance of Dgurative speech is 
conventional, i.e. grounded in conventional, systematic and pervasive ontological 
relationships |T]. The essence of this is that pervasive and structured relations 
or relations made salient by the context | ll4j may be as easily available to com- 
prehension as inclusion relations. 

8 Conclusion 

In this paper, we have drawn a distinction between literal and Dgurative speech 
which is based on formal criteria. These are grounded in the solid framework 
of description logics, in particular, by relying on its settheoretical semantics. 
We aimed at a comprehensive distinction, one that covers the whole range of 
word-centered Dgurative language use (i.e., metonymies, metaphors, irony) using 
general and simple criteria. A major determinant for Dgurative interpretation is 
given by the context of an utterance. As a consequence, contextual criteria are 
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at the core of any adequate account of Dgurativeness. Our contextual criteria 
are independent of the notion of selectional restrictions, but dependent on the 
conceptual interpretation of syntactic relations, in general. Another crucial con- 
dition of whether language use is considered literal or Dgurative is introduced 
by the particular ontology referred to. While earlier formalist approaches appeal 
to semantic types, sortal constraints, etc., this is not fully convincing, since the 
entire structure and granularity of the theory of the domain being talked about 
contributes to the understanding process, whether literally or Dguratively based. 
In particular, we captured the notion of subjectivity in ontological premises and 
explained how granularity problems may be overcome. 

The model we presented does currently not account for neologisms, as those 
have no a priori lexical meaning, and many tricky cases of quantiD cation and 
the use of proper names. From a more technical perspective, we have also not 
scrutinized the diDerent kinds of relations that are still required to hold between 
w.CONsf and in. CON, if w.CONgf D in. CON = D. So, a necessary condition for Dg- 
urative speech has been established that needs to be supplemented by suD dent 
ones. We also have no criteria available right now to distinguish between various 
types of Dgurative speech (e.g., metaphors vs. irony). By this, we mean speciDca- 
tions concerning similarity relations for metaphors, contrast relations for irony, 
and contiguity relations for metonymies (cf., e.g., [IS])- Finally, we stop short 
of distinguishing between innovative Dgurative speech (like in the ham sandwich 
example) and conventionalized Dgurative speech {systematic polysemy [24l‘2d) l. 
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Abstract. In this ongoing work we present a method to associate to 
an object a set of spheres representing its shape (skeleton and thickn- 
ess) and orientation. The notion of direction is defined with spheres and 
mereotopological relations. The study of the relative position of two di- 
rections leads to the notion of angle. We completely survey the case of 
shape representation of 2D convex objects, and propose a construction 
algorithm. 



Keywords: Qualitative reasoning, spatial reasoning, shape recognition, spatial 
representation 

1 Introduction 

In recent years, the need for reasoning on qualitative data has emerged in se- 
veral domains, for instance in robotics (route planning) and in human-machine 
interface. In this way, several works have been devoted to spatial representa- 
tion and reasoning about spatial scenes. Classical methods of computation from 
Cartesian geometry are efficient to manipulate numerical data, but high level 
representations (needed for qualitative reasoning) are very hard to obtain from 
such a numerical framework. 

Henceforth a lot of works focused on non classical representations. The pri- 
mitive elements of those theories, called regions, are spatially extended entities. 
A region can directly be interpreted as the spatial referent of an object (i.e. the 
portion of space occupied by the object). Basic spatial concepts like those of 
mereology (part-whole relations) and topology (connection relations) are then 
easy to represent with such ontological elements, and various authors studied 

these aspects mm- 

In order to obtain an expressive power similar to classical geometries, the 
following notions also have to be represented: distance, orientation and shape. 

This paper deals with the representation of shape and to some extent with 
orientation. The necessary metrical concepts are introduced through the use of 
spheres as done by [51 and [5]. In order to provide a unified framework based 
on spatial regions, we rely on a classical mereotopological background. In sec- 
tion |5] we present a detailed theory of mereotopology and qualitative distance 
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based on a limited number of primitives. From this background is constructed 
the fundamental notion of sphere. Spheres enable the consideration of metrical 
concepts such as distance. The rest of this paper develops as follows: we define 
directions of objects and angles between them ; the framework thus obtained is 
used to represent shape, skeleton and thickness of an object in a 2D space. A 
construction algorithm is proposed in the particular case of a convex. 



2 Basic Notions of Geometry: Mereotopology, Distance, 
and Dimension 

In this section are presented the primitive elements of the theory, and the langu- 
age developed to manipulate basic geometrical notions. We use a subset of the 
theory of Asher and Vieu, presented in [2]. This theory is more complex than 
what is presented below, and deals with the notion of weak contact. We kept 
only the mereotopological notions useful for our work. 

Other theories like are suitable but the framework of Asher and Vieu has 
the main advantage of being complete for a particular class of models as shown in 
|2], and to make the topological distinction open/closed, contrary to RCC. The 
figure [T] shows the 2D interpretation of a set of exclusive relations (equivalent to 
RCC8). 




2.1 Asher and Vieu’s Axiomatic 

We use a first order language with equality. C is the primitive connection relation 
from which we can define P (part of), PP (proper part of), O (overlap), PO 
(proper overlap), EC (external connection), TP (tangential part of), NTP (non 
tangential part of it). For the sake of conciseness, the beginning quantifiers have 
been omitted. We have also adopted a simplified notation without parenthesis 




150 



V. Dugat, P. Gambarotto, and Y. Larvor 



for the predicates. Thus we note Cxy instead of C{x,y). The logical “or” and 
the logical “and” will be represented respectively by V and A, the connector of 



implication by 
A 1 Cxx 



>, and = represents a definition. 



A 2 Cxy — >■ Cyx 
A 3 (Vz {Czx O Czy)) x = y 
T> lVxy = Vz(Czx ^ Czy) 

D 2 Wxy = Vxy A -Pyx 



(x is part of y) 
(x is a proper part of y) 
(x and y overlap each other) 
(external connection) 
(x is a tangential part of y) 
D 6 NTPxy = Pxy A -'Bz (ECzxAECzy) (x is a non-tangential part of y) 
The following axioms postulate the existence and uniqueness of some operators: 



D 3 Oxy = 3z (PzxAPzy) 

D 4 ECxy = Cxy A ~^Oxy 
D 5 PPxy = Pxy A 3z (ECzxAECzy) 

d 



A 4 3z 'iu(Cuz -fA (Cux V Cuy)) (sum x + y) 

A 5 3j/ ->Cyx -A 3z 'iu(Cuz O (->Cvx A Cvu)) (complement —x) 

A 6 3x 'iuCux (existence of a universe, noted a*) 

A 7 Oxy -A 3z Vu(Cuz AA 3v (PvxAPvy A Cvu)) (intersection, x ■ y) 

A8 3y Vm(Cuj/ O 3u (NTPua: A Cvu)) (interior, ix) 

A9 3y -•Cyx -A 3z Vu (Cuz <A 3v (-^Cv(i(—x)) A Cvu)) (closure cx) 

D 7 CLa; = (cx = x) (definition of a closed) 

D 8 OPx = (ix = x) (definition of an open) 

A 10 (OPxAOPyAOxy) — >-OP(a; • y) (fundamental property of opens) 

We can also^define a few useful notions: 

D 9 SPxy = -•Ccxcy (separateness) 

D 10 CONx = -i(3a:i3x2(a; = x\+ X 2 ASPxia; 2 )) (self-connectedness) 

(x and y are 



D 11 SCxy = Cxy A 3z(NTPz(x + y) A Ozx A Ozy A CONz) 
strongly connected) 



We add the following definitions to this axiomatic, useful to define the notions 
of sphere: 

D 12 SCONx = -i(3M3w(a; = u + v A ->SCuv)) (x is a strongly self connected 
region) 

D 13 SRx = SCONx A SCON — x (a; is a simple region) 

A simple region is a self connected region that is also regular (no hole in it 
and a one piece region) . 
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2.2 Spheres and Distance 

Below is presented a subset of the theory of distance and dimension of Gamba- 
rotto , based on a single primitive F: Fxy means “a: can be fitted in y” . We 
only use the representation of distance and dimension, as we do not need the 
notion of measure also provided in this framework. F is defined by the following 
axioms: 

All Fxx (reflexivity) 

A 12 FxyAFyz^Fxz (transitivity) 

A 13 Pxy -A Fxy 

Defined as above, the set of regions R with F is a weak order set (F is reflexive 
and transitive). When two objects have equal shape and size, they are said to 
be congruent (they can of course simply be equal). The relation of congruence 
is defined by: 

D 14 x=cgy =Fxy AFyx {x and y are congruent iff x can be fitted in y and 
reciprocally.) 

It is easy to prove that =cg is an equivalence relation: 

P 1 x=cgX fAfTTU rnU (reflexivity) 

P 2 x=cgy ^ y=cgX (EdD ( symmetry) 

P 3 x=cgy Ay=cgZ ^ x=cgZ (All2lE[I2I) (transitivity) 

(R,F) is then an order set (reflexive! All 1 h . antisymmetric (F fTTll . and transi- 
tive ( Al 1 21) with =cg as an equivalence relation). This order is not total, it is not 
always possible to compare the respective dimension of two objects with F, as 
illustrated in the figure [21 




Fig. 2. -iFafo and -iFfea 



2.3 Dealing with Spheres 

A closed sphere is defined as in [6]: 

D 15 SPHx = CLa; A SRx A \/y{x=cgy A FOxy -A SR(a; — y)) (closed sphere) 
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Fig. 3. The first is a sphere, the others are not 



An object a; is a sphere iff it is a simple region (one piece object and no hole) 
and a congruent object cannot disconnect it (figure El • For the sake of simplicity, 
only closed spheres are considered. 

Spheres are interesting, first because F is a total order on the set of spheres, 
and then, in a more computational spirit, it is very easy, given two spheres, to 
conclude which one can be fitted in the other. A sphere is also the most neutral 
object with regard to shape. 

Definitions on closed spheres are introduced, mainly dealing with the relative 
position of a sphere with respect to other spheres. 

The definitions beneath are derived from Tarski’s work [Zj . All the definitions 
and axioms in this section refer to spheres. 

D 16 IDxyz = TWxz f\TWyz f\^u^v{-'Ouz f\-'Ovz f\l^Gxu/\l^Cyv ^ -'Ouv) 
(x and y are internally diametrical wrt z) 

D 17 EDxyz = ECxz A ECyz A \/u\/v{->Ouz A -lOvz A Pxu A Pyv — >■ ->Ouv) {x 
and y are externally diametrical wrt z) 




D 18 CNCxy = x = yV (PPxy A VuVu(EDMr!x A TPPm?/ A TPPuy— >-IDrtuj/)) V 
(PPyx A \/u\/v(EDuvy A TPPita; A TPPua ;— (x is concentric with y) 



D 19 BTWxyz = Bx'By'Bz'(CNCxx' A CNCyy' A CNCzz' A EDy'z'x') (x is 
between y and z) 
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D 20 \.mxyz = BTW xyz V BTWxzy V BTWyxz V C^Gxy V CNCj/z V CNCa;^ 
{x, y and z are aligned) 

D 21 SSBxyz = BTWxyz V BTWyxz V CNCxy (x and y are on the same side 
of z) 

From this we define the following notion: 

D 22 PBxyz = 3x'3y'3z'(CNCxx' A CNCyy' A CNCzz' A DCy'z' A ECx'y' A 
ECx'z' A y' =cg z') (x is on the perpendicular bisector of y and z, see Fig. 




Fig. 5. X is on the perpendicular bisector of y and z 



2.4 Definition of Distance 

As already said above, F is a total order on the set of closed spheres. 

The distance between two regions x and y can be made concrete by fitting 
a sphere between them. For any two disconnected regions, the following axiom 
introduces the minimal (wrt F ) closed sphere that fits between them (figure O : 
A 14 SPxy — )► 3z(SPHzAECcxzAECcyzAVt(SPHt AECtcxAECtcy — >■ Fzt))(z 
is the smallest (wrt F) closed sphere externally connected to x and y and noted 

Hx,y)) 




Fig. 6. The sphere z represents the distance between x and y 
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3 Direction and Angles 

In this section are defined the notion of direction (similar to the notion of seg- 
ment described in [Bj . The relative position of two so-defined directions are then 
studied, to finally introduce the notion of angle. For the sake of simplicity, we 
only consider 2D-space, higher-dimensional space would only entail a greater 
number of relative positions of two directions to take into account. For instance, 
in 3D-space, it would be necessary to consider the case of non-coplanar direc- 
tions. 

All the variables in this section implicitly refer to spheres, and corresponding 
predicates have thus been omitted. 

3.1 Direction 

A direction is entirely defined by two non concentric spheres: 

D 23 Dxy = -^CNCxy (x and y define a direction) 

Two directions are said to be parallel iff they are equal or there is no sphere 
common to both (i.e. they do not intersect). 

D 24 xy//x'y'=DxyADx'y'A{-3uiLINwxyALINwx'y')\/{LINx'xyALINy'xyJ) 
(the directions xy and x'y' are parallel) 
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Fig. 7 . Parallelism and intersection 



The following axiom implies the existence of a sphere belonging to two non 
parallel directions and forces the underlying space to be bi-dimensional. 

A 15 ~^{xy ! I x' y') A T)xy A Dx'y' — >■ 3z{LWzxy A LIN^x'y') 

If two directions xy and x'y' are not parallel, it is possible to define their 
intersection. 

D 25 TAiTzxyx'y' = -'{xyj /x'y') A T)xy A Dx'y' A LINzxy A LINzx'y' {z is at 
the intersection of xy and x'y') 

The possible configurations of two directions are shown figure [71 
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3.2 Angles 

The knowledge that two directions intersect or not is not expressive enough: it 
is necessary to be able to distinguish other particular features. 

Straight and perpendicular angles between two intersecting directions have 
to be differentiated from ordinary angles. 

The angles defined below extend from null angle to straight angle: 

D 26 lyyz = INTyyxyz 

Two directions are said to be perpendicular iff one is the perpendicular bis- 
ector of a segment of the other (figure [HI) . 

T> 27 xy 1. x'y' = 3z(INTzxyx'y' A 3u,v,w{uv/ /xy A wzf jx'y' A PBwuu A 
PBzuu)) (the directions xy and x'y' are perpendicular) 




Fig. 8. Perpendicular directions 



A right angle is defined by two perpendicular directions: 



D 28 T xyz = xy Ayz 


{xyz is a right angle) 


Let us define some trivial angle configurations: 




D 29 STRxyz = x^ A BTWj/a;z 


{xyz is a straight angle) 


D 30 NULLxyz = xyz A SSDzxy 


{x^ is a null angle) 



By comparing an angle to a right one, we obtain the following definitions of 
acute and obtuse angles: 

D 31 AGxyz = xf^ A 3w(T wyz A BTWxwz) {xyz is an acute angle) 

In the above definition we assume that FS{y,x)S{y, z). It is always possible 
to construct an equal angle verifying this property. 

D 32 OBTxyz = xyz A 3w(T wyz A BTWwxz) {xyz is an obtuse angle) 
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Fig. 9. Angle xyz 



4 Shape: Skeleton and Thickness 

To measure the thickness in various spots of an object, we use maximal internal 
spheres. Other works use similar approaches such as Lundell |8] for axial repre- 
sentation, or Pasqual del Pobil and Serna who determine a perfect covering 
of the object using an infinity of internal maximal circles. Our spheres in general 
do not overlap and are externally tangent or disconnected. By using convex hull 
of the set of generated spheres we have a good approximation of the shape of 
the object. The strong point is that we need few spheres to determine such a 
shape. 

Of course the notion of shape covers also the architecture of the object, i.e. 
the relative position of parts of the object. As illustrated on the figure ITOl the 
thickness of an object may be constant whereas parts of it are differently set 
out. 




Fig. 10. Thickness and skeleton 



This example demonstrates the necessity to extract the skeleton of the object, 
i.e. the configuration of its different significative directions. 
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Those two types of information are required to fully capture the shape of an 
object. The skeleton is represented by directions defined by spheres (cf. section 
IS; and the thickness is measured by the size of those spheres. The shape of 
an object is then composed of basic elements called segments. Each segment is 
defined by two spheres, thus representing both a direction (direction of the two 
spheres) and a thickness (convex hull of the two spheres). 

Notation: [x, y] is the segment composed of the spheres x and y. We will note 
[x] the degenerated case of a “segment” composed of two identical spheres. 

In a first stage, we focused on a particular class of objects that have a simple 
skeleton: convex. 

A convex is defined by the following formula (figure ITTll : 

D 33 CONVx = VuVuVi(;(SPHu A SPHu A SPHw A (Pux A Vvx A u=cgV A w=cg 
u A BTWwuu — )> Pwx)) (x is convex) 

The following axiom entails the existence and uniqueness of the convex hull 
of an object x (i.e. the smallest (wrt P) convex object y that contains x.): 

A 16 32/(CONVj/ a Vz(CONV 2 ; A Px^; — >■ Vyz)){y is the convex hull of x, noted 
CHx) 




Fig. 11. Convexity: x is not convex, y is. 



In the case of a convex x, the following algorithm provide a construction of 
the representation of the shape S (i.e. skeleton + thickness): 

Initialization: stop=false; t = x; S= 0 
While not stop do 

• CC={maximal (w.r.t. F) connex part of t} 

• MS=maximal (w.r.t. F) sphere part of an element of CC 

• For each ccSCC do 

• MScc={maximal (w.r.t. F) sphere part of cc} 

• TS={spheres of S connected to cc} 

• if (3s S MScc(MS =cg s)) then Case TS 

* = 0 — >■ 

• if MScc={s}then S=SU{[s]}; t = t — i{s)] 

■ else S=SU{[si,S 2 ]} such that V<if 2 S MScc 
(F((5(ti, t 2 )j i5(si, S 2 ))); t = t — — 1 ( 32 ) — i{cc') where cc' is 

the maximal connex part of t—si — S 2 connected to si and S 2 ', 
* ={x| — >■ 
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• if MScc={s}then S=SU{[s,x]};i = t— i(s); 

• else S=SU{[si,a;] [s 2 ,a:]} such that Vtit 2 € MScc 

(F((5(ti, t 2 ), S 2 ))); t = t — i{si) — i{s2) — i{cc') where cc' is 

the maximal connex part of t— si — S 2 connected to si and S 2 ; 
* else stop=true 

Notations: i(s) denotes the interior of s 

Remarks: the property of convexity assume that it is not needed to consider 
a sphere in a connex part already connected to two previously built spheres. 
This new sphere would be part of the convex hull of the sum of the two spheres, 
henceforth the halting condition of the algorithm. 

The size of the built spheres is decreasing. Thus, we eventually reach a sphere 
in connex part already connected to two previously built spheres, the algorithm 
always finishes. 







Almond 


Algorithm steps. 

Sharp 


• 




Eye 


Bullet 



Fig. 12. Different types of shape 



The algorithm allows a taxonomy of various type of convex objects. For 
instance we can deduce information as soon as the algorithm has generated 
three spheres depending on how many spheres have been created at each step of 
the algorithm (numbers between parenthesis). The remarkable cases illustrated 
in figure [T^ are: 

1. eye type object (1+ 2) 

2. sharp or almond shaped (1+1+1) 

3. bullet (2+1) 

Increasing the precision (i.e. more spheres) only entails more specialization 
of the above types. 

A main direction can be defined for each of the previous type of object as 
presented in figure fT^ Note that no main direction emerges from the structure 
of a regular object as a regular polygon (cf. figure IT3D . 
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Fig. 13. Regular polygons with no main direction 



4.1 The Case of a Convex Polygon 

The basic operation of the algorithm is the determination of the maximal interior 
sphere of an object. Such an operation is not trivial in the general case. Any 
convex object can be approximated by a convex polygon as precisely as desired. 
The determination of the maximal internal sphere can be realized for convex 
polygons using generalized Voronoi diagrams. In this case the Voronoi points are 
the points in the polygon that are intersection of the bisectors of three edges. 
So these points are centers of the maximal circles that are tangent to at least 
three edges. We extended this method for the case of polygons that have one 
edge replaced by an arc of a circle. The complexity of the algorithm is O(n^) if 
n is the number of edges of the polygon. 

5 Relative Orientation of Two Convex Objects 

The algorithm seen in the previous section associates a direction (segment) to 
a convex object (in the general case). We can deduce the relative orientation of 
two convex objects by studying the relative position of the associated segments. 

5.1 Intersection of Segments 

We define now the intersection of the directions induced by two segments. The 
intersection of two directions leads to a combinatorial study according to the 
position of z - the intersection sphere - with respect to the spheres defining the 
two segments. Note that z with the spheres of the two segments defines an angle. 
Let xy and x'y' be the two intersecting segments, we have the following cases: 



Same direction If xy and x'y' are parallel the two objects have the same 
direction. We distinguish two cases: 

xyj j x'y' A fjWxyx': The two directions are identical. This case could be 
refined considering the respective positions of the pairs of spheres constituting 
the segments: we would retrieve the RCC8 system for intervals, corresponding 
to Allen’s system without orientation (see figure HU). 

xy/ /x'y' A ->LWxyx' : The two directions are simply parallel. 
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Fig. 14. RCC8 for segments 



If the directions of the two segments are not parallel they have an intersection. 
We study the relative position of the spheres of each segment with respect to 
this intersection. 

The ”V”: xy and x'y' are on one side of z (see figure US} ■ 



D 34 Yxyx'y' = 3z(INT zxyx'y' A SSDxyz A SSDx'j/'z) 

In this case we can consider the angle xzx' which can be acute, obtuse or 
right. 

The ”T”: In this case, a sphere z at the intersection is between the spheres of 
one segment, and on the same side of the other (see figure ITFll . we can only distin- 




Fig. 15. The ”V” and the “T” 
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guish between perpendicular and ordinary angles, the distinction obtuse/acute 
being irrelevant here. 

D 35 Txyx'y' = 3z{\NT zxyx' y' A ((BTWzxy A SSDcc't/'z) V (BTWzx'j/'A 
SSDxyz))) 



The ”L”: The ”L” case is the limiting case between the “V” and the “T” that 
occurs when the intersection 2 ; is concentric with a sphere of one segment. The 
same distinctions as in the “V” case can be made between the two segments. 




Fig. 16. The ”L” and the “X” 



D 36 Lxyx'y' = 3z(INTzxyx'y' A{{SSDxyzA{CNCzx'\/Cl^Czy'))\/ (SSDx'y' zA 
(CNCzxVCNCzy)))) 



The ”X”: xy and x'y' cross over in z (see figure HSJ. In this case, we can only 
make the same distinctions as on the “T” case. 

D 37 Xxyx’y' = 3z(WTzxyx'y' A BTW zxy A BTW zx'y') 

Using this framework and the segments associated to the objects we are able 
to define notions such as: one object is directed towards the other (L or T), 
the two objects cross (X case), the objects are oriented in the same direction 
(parallel) or towards the same point. 



6 Conclusion 



In this article we defined a logical framework to model the shape and the 
orientation of an object. This work is actually limited to 2D convex objects. In 
order to generalize the algorithm of the section [H to any shape object (not just 
convex one), we plan to consider a concave object as a sum of maximal convex 
components. The main difficulty is then to connect the skeleton of each part to 
obtain the global skeleton. 
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Once this generalization completed, we finally could envisage the application 
of this representation to practical problems such as shape recognition. To achieve 
such a goal other domains could be of some help. For instance the relation 
between topological graphs and the skeleton of an object must be precised. Graph 
theory provides efficient algorithms that could be used to the tree-structure we 
proposed to model shape. Similar graph structures are used in chemical domain 
to represent molecules, and structure (shape) comparison algorithms have been 
deve lopped for many years. Then comparison of two shapes as defined in this 
article could be linked to the comparison of two molecules. 
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Abstract. Tabling has become important to logic programming in part 
because it opens new application areas, such as model checking, to logic 
programming techniques. However, the development of new extensions 
of tabled logic programming is becoming restricted by the formalisms 
that underly it. Formalisms for tabled evaluations, such as SLG [3], are 
generally developed with a view to a specific set of allowable operations 
that can be performed in an evaluation. In the case of SLG, tabling op- 
erations are based on a variance relation between atoms. While the set 
of SLG tabling operations has proven useful for a number of applica- 
tions, other types of operations, such as those based on a subsumption 
relation between atoms, can have practical uses. In this paper, SLG is 
reformulated in two ways: so that it can be parameterized using different 
sets of operations; and so that a forest of trees paradigm is used. Equiva- 
lence to SLG of the new formulation. Extended SLG or SLGx, is shown 
when the new formalism is parameterized by variant-based operations. 
In addition, SLGx is also parameterized by subsumption-based opera- 
tions and shown correct for queries to the well-founded model. Finally, 
the usefulness of the forest of trees paradigm for motivating tabling op- 
timizations is shown by formalizing the concept of relevance within a 
tabled evaluation. 



Keywords: Logic Programming, Non-Monotonic Reasoning 

1 Introduction 

The ability to compute queries according to the Well-Founded Semantics [19] 
(WFS) and its extensions has proven useful for a number of applications, in- 
cluding model checking [12] and diagnosis [5]. Within the logic programming 
community, evaluation of the WFS is commonly done using tabled evaluation 
methods [3,2,4]. Of these, SLG resolution [3] (Linear resolution with Selection 
function for General logic programs) is of interest because it can be used with 
non-ground programs and because it has formed the foundation of extended im- 
plementation efforts (such as the XSB system [7]). Extensions to SLG address 
constructive negation [10], and a property called delay minimality [16]. 

Understood in their proper context, the ideas that differentiate SLG from 
other tabling methods are relatively simple. SLG breaks down tabled evaluation 
into a set of primitive operations (or, equivalently transformations). To handle 
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possible loops through negation, SLG provides a delaying operation to dynami- 
cally adjust a computation rule, along with a simplification operation to resolve 
away delayed literals when their truth value becomes known. SLG also ensures 
polynomial data complexity through controlling the manner in which delayed lit- 
erals are propagated and by marking as completed subgoals that will not benefit 
from further resolution. 

SLG was originally presented [3] using a notation in which sets of objects, 
called X-elements, are associated with ordinals, upon which transfinite induction 
is performed. Despite its power, the formulation can be somewhat difficult to 
learn and use, particularly when exploring operational aspects of SLG. Indeed, a 
stated goal of [20] is to derive a calculus for computing the well-founded seman- 
tics that is simpler to understand than SLG. In addition, many of the definitions 
in the original formulation of SLG assume a variance relation between terms to 
determine when a new subcomputation should arise, when an answer should be 
used for resolution against a selected literal or when a delayed literal should be 
simplified. However, to derive new computation strategies for the well-founded 
semantics, or for extensions of the well-founded semantics, new operations are 
needed beyond these variant operations. 

This paper defines a reformulation of SLG called Extended SLG (SLGx) that 
reformulates SLG in two ways. First, because trees may naturally be formulated 
as sets (see e.g. [6]) SLG may be modeled using forests of trees rather than 
explicit sets of X-elements. We believe the resulting framework is clearer than 
the original formulation of SLG; however it does not lose any of the power of 
SLG to formalize transfinite computations. Indeed, an informal forest of trees 
model of SLG has been used to derive scheduling properties of tabled evaluations 
[8] and to motivate the design of an abstract machine for SLG [14, 15]. 

Second, definitions in SLGx are geared so that underlying tabling operations 
are parameterizable. The first result of this paper is to prove full equivalence 
to SLG of SLGx parameterized with variant-style operations (called in this 
paper SLG variance)- The second main result is to parameterize SLGx with 
subsumption-based tabling operations to form a method called SLG subsumption, 
and to prove the soundness and completeness of SLG subsumption for the well- 
founded semantics. While the use of subsumption is not new to tabled evaluations 
of definite programs (cf. [18]), the formulation of a fully subsumption-based 
tabling method for the well-founded semantics is novel, to our knowledge. 

The practical usefulness of SLGx is also demonstrated through an optimiza- 
tion called relevanee. SLG was defined so that a computation terminates only 
when all answers have been derived for an initial query Q along with all subgoals 
that arise in solving Q. However, it can sometimes be useful to ignore a subcom- 
putation when it can be determined that the subcomputation will produce no 
further answers for the initial query. Section 5, formalizes these notions. 

Before proceeding further, we informally introduce concepts of SLGx, as 
parameterized by variant-style operations, through the following example. 

Example 1. Consider a possible SLGuariance evaluation £i of the query ?- p(X) 
against the program Pi 
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p(b) . 

p(c) : - not p(a) . 

p(X):- t(X,Y,Z),not p(Y), not p(Z). 



t(a,b,a) . 
t(a,a,b) . 

An SLG variance evaluation, like any SLGx evaluation, consists of a sequence 
of forests of SLG trees, each of which either have the form 

Answer -template Delayset\Goal-List 

or fail. In the first form, the Answer -template is used to represent bindings 
made to a tabled subgoal in the course of resolution along a computation path; 
the Delay-Set contains a set of literals that have been selected by a fixed-order lit- 
eral selection strategy but whose evaluation has been delayed; and the Goal-List 
is a sequence of unresolved literals. SLGx requires a fixed literal selection strat- 
egy. In this paper we assume, without loss of generality, that the literal selection 
strategy is left-to-right within the Goal-List. The evaluation of the query ?- p(X) 
against P begins with a tree for p(X) with root node p(X) \ p(X). In a root 
node, the Answer -template reflects no bindings to the subgoal, the Delayset 
is empty, and the Goal-List is identical to the original subgoal. Children of root 
nodes are produced through the PROGRAM Clause Resolution operation, 
and the children oi p(X) :- \ p(X) are depicted in Figure 1. In this figure, num- 
bers associated with nodes depict their order of creation in S^. Of these children, 
node 1, p(b) \ is an answer node, defined as a leaf node whose Goal-List is 
empty. The selected literals of the other two children of p{X) \p{X) produce 
new tabled subgoals: t(X,Y,Z), and p(a) through the SLG^ariance operation 
New Subgoal (note that in SLG^ariance a new tree is created for p(a) even 
though p(a) is subsumed by p(X)). The children of non-root nodes whose se- 
lected literals are positive (such as node 4) are produced through the Positive 
Return operation, while the children of nodes with negative selected literals 
(such as node 2) may be produced through the Negative Return operation. 
Because an answer for p(h) is contained in node 9 in the forest of Figure 1, a 
failure node (node 10) is produced as a child for p(a) \ not p(b), not p(a). A 
failure node is subject to no further SLG^ariance operations, and indicates that 
the computation path leading up to it has failed. 

In Figure 1, all possible New Subgoal, Program Clause Resolution, 
Positive Return and Negative Return operations have been performed. 
Despite this, the truth value oi p(a) cannot be determined since the tree iov p( a) 
depends on node 19 {p(a) \ notp(a), not in which the literal notp(a)is 

selected. SLG (and SLGx) overcomes this problem of self-dependency through 
negation by using a Delaying operation which in this case, creates a new node 
p(a) not p(a)\ not p(b) (node 20) in which not p(a) has been moved from the 
Goal-list into the Delayset. In Figure 2, Delaying operations produce nodes 
21 and 22 in addition to node 20. Delaying a negative literal allows other literals 
to be selected whose failure may break the self-dependency through negation 
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(e.g. not p(h) is selected in node 22). Figure 2 includes relevant portions of a 
final forest reached after restarting all subcomputations enabled after delaying 
negative literals. In node 20, for example, the literal not p{h) is selected and 
a Negative Return causes the computation path leading up to that node 
to fail. Delaying not p{a) also creates node 22, p(c) not p(a) |, which is 
termed a conditional answer, because it has a non-empty Delay Set. After node 
23 is produced, all computation paths stemming from p(a) are failed. The result 
is that the literal not p(a) should in fact succeed, and the conditional answer 
p{c) not p(a)\ should become unconditional — not p(a) should be removed 
from its Delay^et. SLG^ariance uses a Simplification operation to produce 
the unconditional answer in node 25, p( a) ;- |. Figure 2 shows the state of 
the SLGvariance evaluation after all possible SLG^ariance operations have been 
applied. Note that each tree has been marked with the token complete denoting 
that it has been completely evaluated and can produce no more answers. 

We now turn to formalizing the concepts presented in Example 1. 



2 A Framework for Parameterizing Tabling Operations 

Terminology and assumptions We assume the standard terminology of logic 
programming (see, e.g. [11]). We assume that any program is defined over a 
countable language £, of predicates and function symbols. If L is a literal, then 
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vars{L) denotes the set of variables in L. The Herhrand base Hp oi a program 
P is the set of all ground atoms formed from £. By a 3-valued interpretation I 
of a program P we mean a set of literals defined over Hp. For A £ Hp, if A £ I, 
A is true in I, and if not A £ I, Ais false in I. When I is an interpretation and 
A is an atom, I\a refers to 

{L \ L £ I and (L = G or L = not G) and G is in the ground instantiation of A} 

In the following sections, we use the terms goal, subgoal, and atom inter- 
changeably. Variant terms are considered to be identical. SLGx evaluations 
allow arbitrary, but fixed literal selection strategies. For simplicity, throughout 
this paper we assume that literals are selected in a left-to-right order. 

We now provide definitions for concepts introduced in Example 1. 

Definition 1 (SLGx Trees and Forest). An S LG forest eonsists of a forest 
of SLG trees. Nodes of SLG trees have the forms: 



Answer -Template Delay JSet\Goal-List 
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or 

fail 

In the first form, the Answer -Template is an atom, the Delay Set is a set of 
delayed literals (see Definition 2) and Goal-List is a sequenee of literals. The 
seeond form is ealled a failure node. The root node of an SLG tree may be marked 
with the token complete. 

We eall a node N an answer when it is a leaf node for whieh Goal_List is 
empty. If the DelaySet of an answer is empty it is termed an unconditional 
answer, otherwise, it is a conditional answer. 

Definition 2 specifies the exact formulation of delay literals. Definitions 8 and 
9 will ensure that the root node of a given SLG tree, T, has the form S |5, 
where 5 is a subgoal. If T is an SLG tree in a forest T whose root node is 5 |5 

(possibly marked as complete), then we use the following terminology. S is the 
root node for T or that T is the tree for S, and S is in S. 

Definition 2 (Delay Literals). A negative delay literal in the Delay _Set of a 
node N has the form not A, where A is an ground atom. Positive delay literals 
have the form where A is an atom whose truth value depends on the 

truth value of some answer Answer for the subgoal Call. If 6 is a substitution, 
then 

Positive delay literals contain information so that they may be simplified 
when a particular answer to a particular call becomes unconditionally true or 
false. It is useful to define answer resolution so that it takes into account the 
form of delay literals. 

Definition 3 (Answer Resolntion). Let N be a node A D|Li,...,L„, where 
n > 0. Let Ans = A' D'\ be an answer whose variables have been standardized 
apart from N. N is SLG resolvable with Ans if3i, 1 < * < n, sueh that Li and 
A' are unifiable with an mgu 6. The SLG resolvent of N and Ans on Li has the 
form 

{A D\Li, ..., Li-i, Lj+i , ...,L„)6 

if D' is empty, and 

{A D,D\L\, ..., Li-i, Lj+i , ..., Ln)6 
otherwise, where D = Li if Li is negative, and D = otherwise. 

A set of subgoals is completely evaluated when it can produce no more an- 
swers. Formally, 

Definition 4 (Completely Evalnated). A set S of subgoals in a forest D is 
completely evaluated if at least one of the eonditions holds for eaeh S G S 

1. The tree for S eontains an answer S |; or 

2. For eaeh node N in the tree for S: 

(a) The seleeted literal Ls of N is eompleted or in S; or 
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(b) There are no applieable New Subgoal, Program Clause Resolu- 
tion, Positive Return, Delaying, or Negative Return operations 
(Definition 9) for N. 

Once a set of subgoals is determined to be completely evaluated, the comple- 
tion operation marks the root node of the trees for each subgoal (Definition 1). 

According to Definition 3, if a conditional answer is resolved against the se- 
lected literal in the Goal-List of a node, the information about the delayed liter- 
als in the answer need not be propagated. In [3] it is shown that the propagation 
of delay elements as specified in Definition 3 is necessary to ensure polynomial 
data complexity. However, in certain cases, the propagation of delayed answers 
can lead to a set of unsupported answers as shown in the example below. 

Example 2. Consider the program P 2 : 

p <- not q. 

P <- P- 

q <- not p. 

q- 

and query ?- p. In the well-founded model for P 2 , p is false and q is true. A 
forest for a possible S LG variance (or S LG subsumption) evaluation is shown in 
Figure 3, whose nodes are numbered by the order of their creation. Consider the 
sub-forest of Figure 3 induced by all nodes numbered 6 or less. In nodes 4 and 
6 the literals not p and not q have been delayed creating conditional answers 
for nodes 4 and 5. The conditional answer for p (node 5) is returned to node 
6 creating a second conditional answer, node 6. Subsequently, in node 7, an 
unconditional answer is found for g, causing q to be successful (Definition 7), 
and a Simplification operation to be performed that creates the failure node, 
node 8. However, p, although it is completely evaluated, cannot be determined 
to be false, because it has a conditional answer that depends positively on itself, 
or is unsupported (see Definition 5). Unsupported answers are handled through 
the Answer Completion operation. 

Definition 5 (Snpported Answer). Let P be a SLG forest, S a subgoal in 
P , and Answer be an atom that oeeurs in the head of some answer of S. Then 
Template is supported by S in P if and only if: 

1. S is not eompletely evaluated; or 

2. there exists an answer node Answer Delay Pet\ of S sueh that for every 

positive delay literal Ans is supported by Call. 

As an aside, we note that unsupported answers appear to be uncommon in 
practical evaluations which minimize the use of delay such as [16]. 

An SLGx evaluation consists of a (possibly transfinite) sequence of SLG 
forests. In order to define the behavior of an evaluation at a limit ordinal, we 
define a notion of a least upper bound for a set of SLG trees. If a global ordering 
on literals is assumed, then the elements in the DelayPet of a node can be 
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uniformly ordered, and under this ordering a node of a tree can be taken as a 
term to which the usual definitions of variance and subsumption apply (See the 
full version of this paper, in http://www.cs.sunysb.edu/~tswift for details). 
In particular, nodes of SLG trees are treated as identical when they are variant. 

A rooted tree can be viewed as a partially ordered set in which each node 
N is represented as {N, P} in which P is a tuple representing the path from N 
to the root of the tree [6]. When represented in this manner, it is easily seen 
that when Ti and T 2 are rooted trees, Ti C T 2 iff Ti is a subtree of T 2 , and 
furthermore, that if Ti and T 2 have the same root, their union can be defined as 
their set union, for Ti and T 2 taken as sets. 

Definition 6 (Tabled Evalnation). Given a program P, an atomic query Q 
and a set of tabling operations (from either Definition 8 or Definition 9), a tabled 
evaluation £ is a sequence of SLG forests such that: 

— Po is the forest containing a single tree Q :- | Q 

— For each successor ordinal, n + 1 < fd, P„i is obtained from by an appli- 

cation of a tabling operation. 

— For each limit ordinal a < j3, Pa is defined as the set of trees T such that 

• The root ofT, S |5 is the root of some tree in a forest Fi, i < a; and 

• T = U{T,|T, e Pi,i < a and T, has root S |5} 

If no operation is applicable to Pa, Pa is called a final forest of£. If Pp contains 
a leaf node with a non-ground selected negative literal, it is floundered. 

SLG forests are related to interpretations in the following manner. 

Definition 7. Let P be a forest. Then the interpretation induced by P , Ijr has 
the following properties. 

— A (ground) atom A G Ij:- iff A is in the ground instantiation of some uncon- 
ditional answer Ans :- \ in P . 

— A (ground) atom not A ^ Ij^ iff A is in the ground instantiation of a com- 
pletely evaluated subgoal in P , and A is not in the ground instantiation of 
any answer in P . 
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An atom S is successful in T if the tree for S has an uneonditional answer S. 

S is failed in T if S is eompletely evaluated in T and the tree for S eontains no 
answers. An atom S is successful ffailed^ in Ijr if S' (not S') is in Ijr for every 
S' in the ground instantiation of S. A negative delay literal not D is sueeessful 
(failed) in a forest T forest if D is (failed) sueeessful in T . Similarly, a positive 
delay literal is sueeessful (failed) in a if if Call has an uneonditional 

answer Ans \ in T . 

3 SLGjjariance Variant Tabling with Delay 

SLGvariance uses a variant relation on terms to determine when to add a new 
SLG tree to the SLG forest in the New Subgoal operation; to determine 
whether an answer or program clause may be used for resolution in the PROGRAM 
Clause Resolution and Positive Return operations; and in removing a 
delay literal or failing a conditional answer in the Simplification instruction. 
These operations are as follows. 

Definition 8 {SLG variance Operations). Given a forest iFn of a SLG variance 
evaluation of program P and query Q, where n is a non-limit ordinal, Pn-i-i 
be produeed by one of the following operations. 

1. New Subgoal; Let contain a non-root node 

N = Ans :- Delay Set\G, Goal-List 

where G is the selected literal S or not S. Assume Tn contain no tree with 
root subgoal S. Then add the tree S :- |5 to Tn- 

2. Program Clause Resolution; Let Tn contain a root node N = S :- |5 
and C be a program clause Head :- Body such that Head unifies with S with 
mgu 6. Assume that in Tn, N does not have a child NchUd = {S \Body)9. 
Then add NchUd as a child of N. 

3. Positive Return; Let Tn contain a non-root node N whose selected literal 
S is positive. Let Ans be an answer node for S in Tn and NchUd be the SLG 
resolvent of N and Ans on S. Assume that in Tn, N does not have a child 
Nchiid- Then add NchUd as a child of N. 

f. Negative Return; Let Tn contain a leaf node 

N = Ans :- DelaySet\not S, Goal-List. 

whose selected literal not S is ground. 

(a) Negation Success; If S is failed in T , then create a child for N of the 
form: Ans :- DelaySet\Goal-List. 

(b) Negation Failure; If S succeeds in T , then create a child for N of the 
form fail. 

5. Delaying; LetTn contain a leaf node N = Ans :- DelaySet\not S, Goal-List, 
such that S is ground, in Tn, but S is neither successful nor failed in Tn. 
Then create a child for N of the form Ans :- DelaySet,not S\Goal-List. 

6. Simplification; LetTn contain a leaf node N = Ans :- DelaySet\, and let 
L e Delayset 
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(a) If L is failed in T then ereate a ehild fail for N. 

(b) If L is sueeessful in T , then ereate a ehild Ans DelaySet'\ for N , 
where Delay Set' = Delay Set — L. 

7. Completion; Given a eompletely evaluated setS of subgoals (Definition 4), 
mark the trees for all subgoals in S as eompleted. 

8. Answer Completion; Given a set of unsupported answers UA, ereate a 
failure node as a ehild for eaeh answer Ans € UA. 

An interpretation induced by a forest (Definition 7) has its counterpart 
for SLG, (Definition 5.2 of [3]). Using these concepts, we can relate SLG to 
S LGya^iance evaluations. 

Theorem 1. Let P be a finite program and Q an atomie query. Then there 
exists an SLG evaluation £ = 5q, ...,Sf^ of P and Q if and only if there exists 
S LG ^ evaluation £ — ..., v sueh that 

Proof. The proof of this and other theorems is provided in the full version of 
this paper, available through http://www.cs.sunysb.edu/~tswift. 

4 Subsumption-Based Tabling with Delay 

The variance relation on atoms is used implicitly in several SLG^ariance opera- 
tions; by replacing these uses by a subsumption relation on atoms SLG subsumption 
is obtained. Specifically, in SLG subsumption, for a New Subgoal operation to 
be applicable in a forest, the new subgoal must not be subsumed by any subgoal 
in the forest. Thus, a SLGuariance evaluation may perform New Subgoal oper- 
ations that are not necessary in SLG subsumption • Similarly, the SLG subsumption 
Program Clause Resolution and Positive Return operations will pro- 
duce a child of a node N only if that child is not subsumed by any other child 
of N. Other operations are also affected. A Simplification operation may be 
applicable to a delay literal D, if D e /jf, rather than through the conditions 
on delay literals specified in Definition 7 ^ . Finally, a subgoal may become com- 
pletely evaluated if it is subsumed by a subgoal that is also completely evaluated. 
This last condition is reflected in the SLG subsumption completion instruction 
rather than by formulating a new definition of completely evaluated. 

Definition 9 (SLG subsumption Operations). Given a state Tn of an SSLG 
evaluation of program P and query Q, where n is a non-limit ordinal, Pn-i-i 
be produeed by one of the following operations. 

1. New Subgoal; Let Pn contain a non-root node 

N = Ans :- Delay Set\G, Goal-List 

where G is the selected literal S or not S. Assume Tn contain no tree with 
root subgoal S' such that S' subsumes S. Then add the tree S :- |5 to Tn- 



^ Thus the form of positive delay literals do not need the annotations described in 
Definition 2, but they are maintained here for sake of a uniform representation. 
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2. Program Clause Resolution; Let Tn contain a root node N = S |5 
and C be a program clause Head Body such that Head unifies with S 
with mgu 9. Assume that in Tn, N does not have a child that subsumes 
Nchiid = (S \Body)9. Then add NchUd as a child of N. 

3. Positive Return; Let Tn contain a non-root node N whose selected literal 
S is positive. Let Ans be an answer node in Tn and NchUd be the SLG 
resolvent of N and Ans on S. Assume that in Tn, N does not have a child 
which subsumes NchUd- Then add NchUd as a child of N. 

f. Negative Return; Let Tn contain a leaf node: 

N = Ans DelaySet\not S, Goal-List 
whose selected literal not S is ground. 

(a) Negation Success; If not S e then create a child for N of the 
form: Ans :- DelaySet\Goal-List. 

(b) Negation Failure; If S G Ij^, then create a child for N of the form 
fail. 

5. Delaying; LetTn contain a leaf node N = Ans :- DelaySet\not S, Goal-List, 
such that S is ground, in Tn, but S is neither successful nor failed in Ij^^. 
Then create a child for N of the form Ans :- DelaySet,not S\Goal-List. 

6. Simplification; Let Tn contain a leaf node: N = Ans :- DelaySet\, such 
that L e Delay-Set. 

(a) If L is failed in Ij^^ then create a child fail for N. 

(b) If L is successful in Ij^^, then create a child: Ans :- DelaySet'\ for N, 
where Delay-Set' = Delay-Set — L. 

7. Answer Completion; Given a set of unsupported answers UA, create a 
failed child for some answer Ans € UA. 

8. Completion; Let S be a set of subgoals in Tn such that for each S G S, 

(a) S is completely evaluated; or 

(b) S is subsumed by some subgoal S' such that the tree for S' exists in Tn 
and is marked as complete. 

Then mark as complete the tree for each S G S. 

Theorem 2. Let P be a program and Q a query. Then there exists an SLG variance 
evaluation of Q against P with final state T^ if and only if there exists an 
SLG subsumption evaluation £^ of Q against P with final state T^ such that 

I = I 

SLGuariance and S LG subsumption Can be Considered as two extreme types of 
tabled evaluation. From an implementational perspective, it may be useful to 
mix the operations from different evaluation methods. For instance, by replac- 
ing the SLGuariance New Subgoal Operation with that of SLG subsumption, an 
evaluation may be said to use call subsumption. By replacing the SLGuariance 
Positive Return operation with that of SLG subsumption, an evaluation may 
be said to use answer subsumption. Special cases of call and answer subsumption 
have been discussed in the tabling literature for definite programs (see e.g. [18]). 
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5 Relevant Subgoals: an Example of a SLG Optimization 

The previous theorems hold for any SLG variance or SLG subsumption evaluations: 
that is, for any ordering of applicable operations. However, in order to apply ei- 
ther method to practical programs, it can be useful to restrict evaluations to 
have certain properties. For instance, the notion of incremental completion, of 
applying a completion operation as soon as possible so that space for com- 
pleted trees can be reclaimed, was described in [14]. Here, we describe another 
optimization, based on the notion of relevance. An SLG tree that is not relevant 
to an initial query need have no operations performed on it and can in principle 
be disposed of, reclaiming space. We begin by restating the definition of the 
subgoal dependency graph (SDG) which provides a useful abstraction of an SLG 
forest. 

Definition 10 (Snbgoal Dependency Graph). Let be a forest in a SLGx 
evaluation. We say that a tabled subgoal Si directly depends on a tabled subgoal 
S2 in T iff neither the tree for S\ nor that for S2 is marked as complete and S2 
is the selected literal of some node in the tree for S\ . The Subgoal Dependency 
Graph of T , SDG(iF), is a directed graph V,E in which V is the set of root goals 
for trees in T and (Si,Sj) G E iff Si directly depends on Sj. 

Example 3. Figure 4 represents the SDG for the forest of Figure 1. 



P(X) 



not 




Fig. 4. SDG for Forest of Figure 1 



Since the dependency relation is non-symmetric, the SDG is a directed graph 
and can be partitioned into strongly connected components, or SCCs. In partic- 
ular an independent SCC is one which depends on no other SCC. 

Example f. Consider the program Prei- 

p:- q(X) ,r(Y) ,s(X) . 

q(a) . q(b) . 

r(c). r(d). 

s(b) . 

The tree for p in the SLGuariance evaluation for the query ?- p is shown in Figure 
5. Note that once an answer for the subgoal r(Y) is obtained, further answers 
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become irrelevant to evaluation of p. However, in the case of the subgoal q{X), 
answers beyond the first (g(a)) are in fact relevant for solving the goal p, even 
though q{X) shares no variables with p. 




Definition 11 captures the notion of when one query is relevant to another. 

Definition 11 . Let X he an SLG forest, N = Ans Delay JSet\Goal-List be a 

node T , and S the root subgoal of N. A seleeted literal L in Goal-List is relevant 
in N if 

1. vars(L) n vars(S) ^ 0 ; or 

2. vars(L) n L' 0 , where L' is a non-seleeted literal in Goal-List. 

Next, let (81,82) be an edge of 8DG{X). (81,82) is relevant of 8DG{X) if 

— 82 is the seleeted literal of a leaf node in 81; or 

— 82 is the seleeted literal of a non-leaf node N in 81 and 82 is relevant in N. 

A relevant path from a node 81 to a node 82 exists in 8DG(X) if there is a path 
of relevant edges from 81 to 82- If sueh a path exists, we say that 82 is relevant 
to 81 in T ■ 

A relevant evaluation can be defined by constraining operations to act within 
trees that are relevant to the original query. 

Definition 12 (Relevance). A V8LG evaluation, Tq ,TY , ■ ■ ■ of a query 
Q against a program P is relevant if every New Subgoal, Program Clause 
Resolution, Positive Return, Delaying, Negative Return, and Sim- 
plification operation O ereating Tn from Tn-\ is applied to a node N whose 
root subgoal 8 is relevant to Q in 8DG{Pn-i) ■ 
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Note that relevance is defined as a property of forests, and is thus well defined 
for each forest in a possibly transfinite evaluation. 

Theorem 3. Let Tq, . . . , be a relevant evaluation of a query Q against 
a program P, and let represent the well-founded model of P. Then 

“ = M.^\q 

— For any VSLG evaluation, ,PY ■< ■ ■ ■ ■< ^ of Q against P, \< li- 

Programs and queries can be easily constructed in which relevant evalu- 
ations terminate finitely and non-relevant evaluations do not. Relevance in a 
tabled evalaution thus captures aspects of the Prolog cut, as well as existential 
query optimization for deductive databases [13]. Aspects of relevance are be- 
ing explored to allow tabled logic programs to implement partial-order model 
checking. Relevance differs from the cut, however, in that irrelevant trees are not 
necessarily removed from forests; rather operations on these trees can be post- 
poned. If this strategy of keeping uncompleted irrelevant trees is adopted, it can 
be shown that relevant SLG evaluations thus maintain the polynomial data com- 
plexity properties of general SLG. How to implement relevant evaluations is still 
an open question, particularly the determination of whether relevance should be 
done dynamically as part of an engine, or should be informed by analysis. 



6 Discussion 

As originally formulated, SLG cannot be learned and used without a relatively 
large amount of intellectual commitment; the forest of trees model that underlies 
SLGx may well reduce this committment. In addition, the formulation of SLGx 
makes it easier to formulate alternate sets of operations as has been demonstrated 
by the creation of SLG subsumption- Other extensions are presently being formu- 
lated and implemented by using new sets of SLGx operations: for abduction 
over the well-founded semantics [1], for generalized annotated programs [17], 
and to formalize algorithms for distributed tabling [9]. Also, as tabling becomes 
used for serious implementation of powerful non-monotonic logics, program op- 
timizations become increasingly important, along with the necessity to prove 
these optimizations correct. It is hoped that the reformulation described in this 
paper will make such efforts easier than they have been to date. 
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Abstract. YapOr is an or-parallel system that extends the Yap Prolog 
system to exploit implicit or-parallelism in Prolog programs. It is based 
on the environment copying model, as first implemented in Muse. The de- 
velopment of YapOr required solutions for some important issues, such as 
designing the data structures to support parallel processing, implemen- 
ting incremental copying technique, developing a memory organization 
able to answer with efficiency to parallel processing and to incremental 
copying in particular, implementing the scheduler strategies, designing 
an interface between the scheduler and the engine, implementing the 
sharing work process, and implementing support to the cut builtin. 

An initial evaluation of YapOr performance showed that it achieves very 
good performance on a large set of benchmark programs. Indeed, YapOr 
compares favorably with a mature parallel Prolog system such as Muse, 
both in terms of base speed and in terms of speedups. 

Keywords: Parallel Logic Programming, Scheduling, Performance. 



1 Introduction 

Prolog is arguably the most important logic programming language. It has been 
used for all kinds of symbolic applications, ranging from Artificial Intelligence 
to Database or Network Management. Traditional implementations of Prolog 
were designed for the common, general-purpose sequential computers. In fact, 
WAM jlS] based Prolog compilers proved to be highly efficient for standard se- 
quential architectures and have helped to make Prolog a popular programming 
language. The efficiency of sequential Prolog implementations and the declara- 
tiveness of the language have kindled interest on implementation for parallel 
architectures. In these systems, several processors work together to speedup the 
execution of a program . Parallel implementations of Prolog should obtain bet- 
ter performance for current programs, whilst expanding the range of applications 
we can solve with this language. 

Two main forms of implicit parallelism are present in logic programs [^. 
And- Parallelism corresponds to the parallel evaluation of the various goals in 
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the body of a clause. This form of parallelism is usually further subdivided 
into Independent And- Parallelism in which the goals are independent, that is, 
they do not share variables, and Dependent And-Parallelism in which goals may 
share some variables with others. In contrast. Or- Parallelism corresponds to the 
parallel execution of alternative clauses for a given predicate goal. 



Original research on the area resulted in several systems that successfully 
supported either and-parallelism or or-parallelism. These systems were shown 
to obtain good performance for classical shared-memory parallel machines, such 
as the Sequent Symmetry. Towards more flexible execution, recent research has 
investigated how to combine both and- and or-parallelism , and how to support 
extensions to logic programming such as constraints or tabling mm- 



Of the forms of parallelism available in logic programs, or-parallelism is argu- 
ably one of the most successful. Experience has shown that or-parallel systems 
can obtain very good speedups for a large range of applications, such those that 
require search. Designers of or-parallel systems must address two main problems, 
namely scheduling and variable binding representation. In or-parallel systems, 
available unexploited tasks arises irregularly and thus, careful scheduling is re- 
quired. Several strategies have been proposed to this problem 



gUEIIHIEI 



The binding representation problem is a fundamental problem that arises 
because the same variable may receive different bindings in different or-branches. 
A number of approaches have been presented to tackle the problem. Two 
successful ones are environment copying, as used in Muse [2], and binding arrays, 
as used in Aurora m- In the copying approach, each worker maintains its own 
copy of the path in the search tree it is exploring. Whenever work needs to 
be shared, the worker that is moving down the tree copies the stacks from the 
worker that is giving the work. In this approach, data sharing between workers 
only happens through an auxiliary data structure associated with choice points. 



In contrast, in the binding array approach work stacks are shared. To obtain 
efficient access, each worker maintains a private data structure, the binding 
array, where it stores its conditional bindings. To allow for quick access to the 
binding of a variable the binding array is implemented as an array, indexed by 
the number of variables that have been created in the current branch. The same 
number is also stored in the variable itself, thus giving constant-time access to 
private variable bindings. 

Initial implementations of or-parallelism, such as Aurora or Muse, relied on 
detailed knowledge of a specific Prolog system, SICStus Prolog. Further, they 
were designed for the original shared memory machines, such as the Sequent 
Symmetry. Modern Prolog systems, even if emulator based, have made substan- 
tial improvements in sequential performance. These improvements largely result 
from the fact that though most Prolog systems are still based on the Warren 
Abstract Machine, they exploit several optimizations not found in the original 
SICStus Prolog. Moreover, the impressive improvements on CPU performance 
over the last years have not been followed by corresponding bus and memory 
performance. As a result, modern parallel machines show a much higher latency. 
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as measured by the number of CPU clock cycles, than original parallel architec- 
tures. 

The question therefore arises of whether the good results previously obtai- 
ned with Muse or Aurora in Sequent style machines are repeatable with other 
Prolog systems in modern parallel machines. In this work, we present YapOr, 
an or-parallel Prolog system, that is based on the high performance Yap Prolog 
compiler [3, and demonstrate that the low overheads and good parallel speedups 
are in fact repeatable for a new system in a very different architecture. 

The implementation of or-parallelism in YapOr is largely based on the en- 
vironment copying model as first introduced by Ali and Karlson in the Muse 
system mm- We chose the environment copying model because of the simpli- 
city and elegance of its design, which makes it simpler to adapt to a complex 
Prolog system such as Yap, and because of its efficiency as Muse has consistently 
demonstrated less overheads then competing or-parallel systems such as Aurora. 
However, in order to support other or-parallel models, the system has been de- 
signed such that it can easily be adaptable to alternative execution models. 

The substantial differences between YapOr and Muse resulted in several con- 
tributions from our design. YapOr uses novel memory organization and locking 
mechanisms to ensure mutual exclusion. We introduce a different mechanism to 
handle backtracking, as an extension of WAM instructions, and not through a 
SICStus specific mechanism. As in the original Yap, YapOr uses just one stack 
for environments and choice points, in opposition to SICStus which uses two. 
This requires adjustments to the sharing and synchronization procedures when 
two workers share work. This also requires different formulas to calculate the 
portions of stacks that have to be copied when sharing work takes place. YapOr 
introduces a new protocol to handle the cut predicate and a new scheme to sup- 
port the solutions that are being found by the system and that may correspond 
to speculative work. 

Performance analysis showed that parallel performance was superior to that 
of the original Muse, and better than the latest Muse as available with the current 
commercial implementation of SICStus Prolog. A first YapOr implementation 
has integrated in the freely distributable YAP system. 

The remainder of the paper is organized as follows. First we present the 
general concepts of the Environment Copying Model. Next, we introduce the 
major implementation issues in YapOr. We then give a detailed performance 
analysis for a standard set of benchmarks. Last, we present our conclusions and 
further work. 



2 The Environment Copying Model 

As previous systems, YapOr uses the multi-sequential approach m- In this 
approach, workers (or engines, or processors or processes) are expected to spend 
most of their time performing reductions, corresponding to useful work. When 
they have no more goals or branches to try, workers search for work from fellow 
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workers. Which workers they ask for work and which work they receive is a 
function of the scheduler. 



Basic Execution Model 

Parallel execution of a program is performed by a set of workers. Initially all 
workers but one are idle., that is, looking for their first work assignment. Only 
one worker, say P, starts executing the initial query as a normal Prolog engine. 
Whenever P executes a predicate that matches several execution alternatives, 
it creates a choice point (or node) in its local stack to save the state of the 
computation at predicate entry. This choice point marks the presence of potential 
work to be performed in parallel. 

As soon an idle worker finds that there is work in the system, it will request 
that work directly from a busy worker. Consider, for example, that worker Q 
requests work from worker P. If P has available work, it will share its local choice 
points with Q. To do so, worker P must turn its choice points public first. In 
the environment copying model this operation is implemented by allocating or- 
frames in a shared space to synchronize access to the newly shared choice points. 
Next, worker P will hand Q a pointer to the bottom-most shared choice point. 

The next step is taken by worker Q. In order for Q take a new task, it must 
copy the computation state from worker P up to the bottom-most shared choice 
point. After copying, worker Q must synchronize its status with the newly copied 
computation state. This is done by first simulating a failure to the bottom-most 
choice point and then by backtracking to the next available alternative within 
the branch and starting its execution as a normal sequential Prolog engine would. 

At some point, a worker will fully explore its current sub-tree and become 
idle again. In this case, it will return into the scheduler loop and start looking 
for busy workers in order to request work from them. It thus enters the behavior 
just described for Q. Eventually the execution tree will be fully explored and 
execution will terminate with all workers idle. 



Incremental Copying 

The sharing work operation poses a major overhead to the system as it involves 
the copying of the executions stacks between workers. Hence, an incremental co- 
pying strategy [2| has been devised in order to minimize this source of overhead. 

The main goal of sharing work is to position the workers involved in the 
operation in the same node of the search tree, leaving them with the same com- 
putational state. Incremental copying achieves this goal, making the receiving 
worker to keep the part of its state that is consistent with the giving worker, 
and only copying the differences between both. 

This strategy can be better understood through Fig. [T] Suppose that worker 
Q does not find work in its branch, and that there is a worker P with available 
work. Q asks P for sharing, and backtracks to the first node that is common to P, 
therefore becoming partially consistent with part of P. Consider that worker P 
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decides to share its private nodes and Q copies the differences between P and Q. 
These differences are calculated through the information stored in the common 
node found by Q and in the top registers of the local, heap and trail stacks of P. 
To fully synchronize the computational state between the two workers, worker 
Q needs to install from P the bindings trailed in the copied segments that refers 
to variables stored in the maintained segments. 




Fig. 1. Some aspects of incremental copying. 



Scheduling Work 

We can divide the execution time of a worker in two modes: scheduling mode and 
engine mode. A worker enters in scheduling mode whenever it runs out of work 
and starts searching for available work. As soon as it gets a new piece of work, 
it enters in engine mode. In this mode, a worker runs like a standard Prolog 
engine. 

The scheduler is the system component that is responsible for distributing 
the available work between the various workers. The scheduler must arrange the 
workers in the search tree in such a way that the total time of a parallel execution 
will be the least possible. The scheduler must also maintain the correctness of 
Prolog sequential semantics and minimize the scheduling overheads present in 
operations such as sharing nodes, copying parts of the stacks, backtracking, 
restoring and undoing previous variable bindings. 

To achieve these goals, the scheduler follows the following strategies: 

— When a busy worker shares work, it must share all the private nodes it has 
in the moment. This will maximize the amount of shared work and possibly 
avoid that the requesting worker runs out of work too early. 
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— The scheduler selects the busy worker that simultaneously holds the highest 
work load and that is nearest to the idle worker. The work load is a measure 
of the amount of unexplored private alternatives. Being near corresponds to 
the closest position in the search tree. This strategy maximizes the amount 
of shared work and minimizes the stacks parts to be copied. 

— To guarantee the correctness of a sharing operation, it is necessary that the 
idle worker is positioned in a node that belongs to the branch on which the 
busy worker is working. To minimize overheads, the idle worker backtracks 
to the bottom common node before requesting work. This reduces the time 
spended by the busy worker in the sharing operation. 

— If at a certain time the scheduler does not find any available work in the 
system, it backtracks the idle worker to a better position, if available, in the 
search tree that should minimize the overheads of a future sharing operation. 

We can resume the scheduler algorithm as follows: when a worker runs out 
of work it searches for the nearest unexplored alternative in its branch. If there 
is no such alternative, it selects a busy worker with excess of work load to share 
work according to the strategies above. If there is no such a worker, the idle 
worker tries to move to a better position in the search tree. 

There are two alternatives to search for busy workers in the search tree: search 
below or search above the current node. Idle workers always start to search 
below the current node, and only if they do not find any busy worker there, 
they search above. The advantages of selecting a busy worker below instead of 
above are mainly two. The first is that the idle worker can request immediately 
the sharing operation, because its current node is already common to the busy 
worker. This avoids backtracking in the tree and undoing variable bindings. The 
other advantage is that the idle worker will maintain its relative position in the 
search tree, but restarting the execution in a bottom level. This corresponds to 
the environment copying model bottom-up scheduling strategy, that has proved 
to be more efficient than the top-down one pp. 

As mentioned before, when the scheduler does not find unexplored alterna- 
tives and no busy workers below or above, it tries to move the idle worker to a 
better position in the search tree. An idle worker moves to a better position, if 
all workers below the current node are idle, or if there are busy workers above 
and no idle workers upper in its branch. 

In the first situation, the idle worker backtracks until it reaches a node where 
there is at least one busy worker below. In the second one, it backtracks until 
it reaches the node that contains all the busy workers below. With this scheme, 
the scheduler tries to distribute the idle workers in such a way as that the 
probability of finding, as soon as possible, busy workers with excess of work 
below the corresponding idle workers’ current nodes is substantially increased. 

3 Extending Yap to Support YapOr 

We next discuss the main issues in extending the Yap Prolog system to support 
parallelism based on environment copying. 
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Memory Organization 

Following the original WAM definition m, the Yap Prolog system includes four 
main memory areas: code area, heap, local stack and trail. The local stack con- 
tains both environment frames and choice points. Yap also includes an auxiliary 
area used to support some internal operations. 

The YapOr memory is divided into two big addressing spaces: the global 
space and a collection of local spaces (see Fig. H). The global space is divided in 
two different areas and contains the global data structures necessary to support 
parallelism. The first area includes the code area inherited from Yap, and a 
global information area that supports the parallel execution. The second one, 
the Frames Area, is where the three types of parallel frames are allocated during 
execution. The local space represents one system worker. It contains a local 
information area with individual information about the worker, and the four 
WAM execution stacks inherited from Yap: heap, local, trail and auxiliary stack. 
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Fig. 2. Memory organization in YapOr. 



In order to efficiently meet the requirements of incremental copy, we follow 
the principles used in Muse and implement the YapOr ’s memory organization 
with the mmap function. This function let us map a file on disk into a buffer in 
memory so that, when we fetch bytes from the buffer, the corresponding bytes 
of the file are read. When mapping, the memory buffers can be declared to be 
private or shared. Obviously, we are interested in shared memory buffers. 
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The starting worker, that is worker 0, uses mmap to ask for shared memory in 
the system’s initialization phase. Afterwards, the remaining workers are created, 
through the use of the fork function, and inherit the addressing space previously 
mapped. Then, each new worker rotates all the local spaces, in such a way that 
all workers will see their own spaces at the same address. 

This mapping scheme allows for efficient memory copying operations. To 
copy a stack segment between diferrent workers, we simply copy directly from 
one worker given address to the relative address in the other worker’s address 
space. Note that reallocation of address values in the copied segments is not 
necessary, because in all workers the stacks are located at the same addresses. 



Choice Points and Or-Prames 

In order to correctly execute the alternatives in a shared choice point, it is ne- 
cessary to avoid possible duplicate alternative exploitation, as different workers 
can reference the choice point. Figure [3] represents the new structure of the 
choice points. The first six fields are inherited from Yap, while the last two were 
introduced in YapOr. The CP_PUA field contains the number of private unex- 
plored alternatives in upper choice points, and is used to compute the worker’s 
load. When a choice point is shared, the CP_0R-FR field saves a pointer to the 
corresponding or-frame associated with it. Otherwise, it not used. 
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Fig. 3. Sharing a choice point. 



A fundamental task when sharing work is to turn public the private choice 
points. Figure El illustrates the relation between the choice points before and 
after that operation, and the resulting connection with the correspondent or- 
frame meanwhile created. The CP_ALT and CP_0R-FR choice point fields are up- 
dated to point respectively to the getwork pseudo-instruction (see next section) 
and to the newly created or-frame. The or-frame is initialized as follows: the 
next_alternative field stays with the ALT pointer, which was previously in the 
CP ALT choice point field (i.e., the control of the untried alternatives goes into 
the or-frame); the workers involved in the sharing operation are marked in the 
workers_bitmap field; the nearmost_live_node field points to the parent choice 
point; and the lock field stays free to allow access to the structure. 
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New Pseudo-Instructions 

YapOr introduced only two new instructions over Yap. One of them, is the al- 
ready mentioned getwork, and the other is the getwork_f irst_time instruction. 
These two instructions are never generated by the compiler. They are introduced 
according to the progress of the parallel execution. 

As mentioned earlier, a pointer to special code containing the getwork in- 
struction is introduced in CP ALT choice point field when the choice point is 
being turned public. This way, when a worker backtracks to a shared choice 
point, it executes the getwork instruction in place of the next untried alterna- 
tive. The execution of this instruction allows for a synchronized access to untried 
alternatives among the workers sharing the correspondent or-frame. 

Whenever the search tree for the top level goal is fully explored, all the 
workers, except worker 0, execute the getwork_f irst_time instruction. This 
instruction puts the workers in a delay state, waiting for a signal from worker 
0, indicating the beginning of a new goal. On the other hand, worker 0 is res- 
ponsible to give all the solutions encountered for the last exploited goal and to 
administrate the interface with the user until he makes a new query goal. 



Worker’s Load 

Each worker holds a local register, load, as a measure of the number of private 
unexploited alternatives. The value in this register is used by the scheduler to 
select the best worker to share work with an idle worker. When a choice point 
is created, the field CP_PUA is initialized with the value of the private untried 
alternatives in the previous choice points. We do not include in this calculation 
values relative to the current choice point to avoid regular actualizations when 
backtracking occurs. The value of the load register is the sum of the previous 
value plus the number of untried alternatives in the created choice point. 

A great number of Prolog programs contain predicates with relatively small 
tasks. To attain good performances in the system it is fundamental to avoid 
sharing such fine-gram work. In YapOr the update of the load register is delayed. 
It is only updated after a certain number of Call instructions are executed and 
whenever a sharing operation takes place, in which case it is reset to zero. A 
positive value in the load register indicates the scheduler that the worker has 
sharable work. By delaying the update of the load register, we want to encourage 
the workers to build up a reserve of local work and avoid over eager sharing. 



Sharing Work 

It is through the sharing process that parallel execution of goals becomes possi- 
ble. This process takes place when an idle worker makes a sharing request to a 
busy worker and receives a positive answer. It can be divided in four main steps. 

The Initial step is where the auxiliary variables are initialized and the stack 
segments to be copied are computed. The Sharing step is where the private 
choice points are turned into public ones. The Copy step is where the computed 
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segments are copied from the busy worker stacks to the idle worker ones. Finally, 
the Installation step is where the conditional variables in the maintained stacks 
of the idle worker are updated to the bindings present in the busy worker stacks. 

To minimize the overheads in sharing, both workers cooperate in the execu- 
tion of the four steps (see Fig. |4j. The idea is as follows: after a common initial 
step, the worker P with excess of load starts the sharing step while the idle worker 
Q starts the copy one. Worker Q copies the stacks from P in the following order: 
trail, heap and local stack. The local stack can only be copied after P finishes 
its sharing step. After the sharing step, P can help Q in the copy step, if it was 
not yet concluded. It copies the stacks to Q but in a reverse order. This scheme 
has proved to be efficient because it avoids some useless variables checks and 
locks. Finally, worker P returns to its Prolog execution while worker Q executes 
the installation step and restarts a new task from the recently installed work. If 
meanwhile, worker P backtracks to a shared node, it has to wait until Q finishes 
its installation step in order to avoid possible undoing of variable bindings. 
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Fig. 4. Synchronizations between P e Q during the sharing work process. 



Implementation of Cut 

Prolog programs use the cut builtin, ! , to discard unnecessary alternatives. On 
the other hand, or-parallel execution means that workers may go ahead and 
execute work that will later be pruned by cut. One says in this case that the 
work is speculative. 

YapOr currently implements a simple mechanism for cut. The worker execu- 
ting cut, must go up in the tree until it reaches either the root cut choice point 
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or a choice point with workers executing left branches. While going up it may 
find workers in branches to the right. If so, it sends them a signal informing 
their branches have been pruned. When receiving such a signal, workers must 
backtrack to the shared part of the tree and become idle workers again. 

Note that a worker may not be able to complete a cut if there are workers in 
left branches, as they can themselves prune the current cut. In these cases, one 
says the cut was left pending. In YapOr, pending cuts are only executed when all 
workers to the left finish their private work and backtrack into the public part 
of the tree. It will then be their responsibility to continue these cuts. 

The existence of cuts and speculative work in the tree also affects scheduling. 
Muse implements a sophisticated strategy to avoid entering workers into sche- 
duling work if there is non-speculative work available |3]. Further work is still 
necessary to make YapOr deal efficiently with this kind of work. 

4 Performance Evaluation 

The evaluation of YapOr was performed on a shared memory parallel machine, 
a Sun SparcCenter 2000 with 8 processors, 256 MBytes of main memory and a 
two level cache system. The system was running SunOS 5.6. The machine was 
otherwise idle while benchmarking. 

A number of benchmark Prolog programs commonly used to assess other 
parallel Prolog systems were used to assess YapOr. All benchmarks find all the 
solutions for the problem. Multiple solutions are computed through “automatic 
backtracking on failure” after a solution has been found. We measured the ti- 
mings and speedups for each benchmark, compared YapOr’s performance with 
that of Muse, and analyzed the parallel activities to identify potential sources of 
overhead. 



Timings and Speedups 

To put the performance results in perspective we first compare YapOr’s perfor- 
mance, configured with one worker, with the performance of Yap Prolog on 
the same machine. We would expect YapOr to be slower than Yap. YapOr 
must update the local work load register, check for sharing solicitations and 
for backtracking messages due to cut operations, and perform small tests to ve- 
rify whether the bottom-most node is shared or private. It was found that Yap 
is on average 13% faster than YapOr. 

Tabled] presents the performance of YapOr with multiple workers. The table 
presents the execution times in milliseconds, for the benchmark programs, with 
speedups relative to the 1 worker case given in parentheses. The execution times 
correspond to the best times obtained in a set of 10 runs. 

The results show that YapOr is efficient in exploiting or-parallelism, giving 
effective speedups over execution with just one worker. The quality of the spee- 
dups achieved depends significantly on the amount of parallelism in the program 
being executed. The programs in the first group, puzzle, 9-queens, ham, Seuhes, 
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Table 1. YapOr execution times and speedups. 





Number of Workers 


Programs 


1 


2 


4 


6 


7 


8 


puzzle 


10.042 


4.835(2.08) 


2.316(4.34) 


1.550(6.48) 


1.339(7.50) 


1.172(8.57) 


9- queens 


4.085 


2.047(2.00) 


1.026(3.98) 


0.690(5.92) 


0.596(6.85) 


0.519(7.87) 


ham 


1.802 


0.908(1.98) 


0.474(3.80) 


0.324(5.56) 


0.281(6.41) 


0.245(7.36) 


5cubes 


1.029 


0.516(1.99) 


0.260(3.96) 


0.181(5.69) 


0.170(6.05) 


0.145(7.10) 


8- queens 2 


1.063 


0.606(1.75) 


0.288(3.69) 


0.202(5.26) 


0.159(6.69) 


0.149(7.13) 


8-queens 1 


0.450 


0.225(2.00) 


0.118(3.81) 


0.080(5.63) 


0.072(6.25) 


0.067(6.72) 


nsort 


2.089 


1.191(1.75) 


0.609(3.43) 


0.411(5.08) 


0.354(5.90) 


0.315(6.63) 


sm*10 


0.527 


0.274(1.92) 


0.158(3.34) 


0.128(4.12) 


0.118(4.47) 


0.115(4.58) 


db5*10 


0.167 


0.099(1.69) 


0.065(2.57) 


0.068(2.46) 


0.060(2.78) 


0.061(2.74) 


db4*10 


0.133 


0.079(1.68) 


0.056(2.38) 


0.055(2.42) 


0.052(2.56) 


0.060(2.22) 


Y 


21.387 


10.780(1.98) 


5.370(3.98) 


3.689(5.80) 


3.201(6.68) 


2.848(7.51) 


Average 




(1.88) 


(3.53) 


(4.86) 


(5.55) 


(6.09) 



8-queens2 and 8-queensl^ have rather large search spaces, and are therefore 
amenable to the execution of coarse-grained tasks. This group shows very good 
speedups up to 8 workers. The speedups are still reasonably good for the inter- 
mediate group, programs nsort and sm*10. For the last group, with programs 
db5*10 and db4*10, the speedups are rather poor and level off very quickly. The 
main reason for this to occur is the very low task granularities present in these 
programs. 

It is interesting that in program puzzle, YapOr obtains super-linear speedups. 
This is probably due to lower miss rates, as the total cache size increases with 
the number of processors. 



YapOr and Muse 

Since YapOr is based on the same environment model as that used by the Muse 
system it is natural that we compare current YapOr’s performance with that of 
Muse on a similar set of benchmark programs. We used the default compilation 
flags for Muse under SICStus, and also the default parallel execution parameters. 
The Muse version is the one available with SICStus Prolog release 3p6. Note that 
Muse under SICStus is a more mature system, and implements functionality that 
is still lacking in YapOr. 

In table |2] we show the performance of Muse for the referred set of benchmark 
programs. YapOr benefits from the faster base performance of Yap, and manages 
to obtain about 20% better execution times on a single processor. 

Surprisingly, the results also show that YapOr obtains better speedup ratios 
than Muse with the increase in the number of workers. This is the case even 
though YapOr has better base performance. We suppose this might a problem 
with the default parameters used in Muse. 
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Table 2. Muse execution times and speedups. 





Number of Workers 


Programs 


1 


2 


4 


6 


7 


8 


puzzle 


12.120 


6.660(1.82) 


3.720(3.26) 


2.670(4.54) 


2.230(5.43) 


2.140(5.66) 


9- queens 


3.890 


2.030(1.92) 


1.110(3.54) 


0.690(5.64) 


0.630(6.17) 


0.560(6.95) 


ham 


2.550 


1.480(1.72) 


0.820(3.11) 


0.520(4.90) 


0.520(4.90) 


0.460(5.54) 


5cubes 


1.130 


0.560(2.02) 


0.280(4.04) 


0.180(6.28) 


0.160(7.06) 


0.150(7.53) 


8-queens2 


1.350 


0.690(1.96) 


0.390(3.46) 


0.270(5.00) 


0.240(5.63) 


0.220(6.14) 


8-queens 1 


0.550 


0.290(1.90) 


0.160(3.44) 


0.120(4.58) 


0.110(5.00) 


0.100(5.50) 


nsort 


2.650 


1.450(1.83) 


0.810(3.27) 


0.550(4.82) 


0.510(5.20) 


0.450(5.89) 


sm*10 


0.670 


0.360(1.86) 


0.220(3.05) 


0.170(3.94) 


0.160(4.19) 


0.150(4.47) 


db5*10 


0.190 


0.110(1.73) 


0.080(2.38) 


0.070(2.72) 


0.070(2.72) 


0.070(2.72) 


db4*10 


0.160 


0.090(1.78) 


0.060(2.67) 


0.070(2.29) 


0.060(2.67) 


0.070(2.29) 


Y 


25.260 


13.720(1.84) 


7.650(3.30) 


5.310(4.76) 


4.690(5.39) 


4.370(5.78) 


Average 




(1.85) 


(3.22) 


(4.47) 


(4.90) 


(5.27) 



Parallel Execution Overheads 

In this section we examine the various activities that take place during YapOr’s 
parallel execution. In particular we timed the various activities a worker may be 
involved during execution in order to help determine which of those activities 
are causing a decrease in performance. The main activities traced are: 

Prolog: time spent in Prolog execution, in verifying work-sharing requests and 
in keeping the work-load register updated. 

Search: time spent searching for a busy worker. 

Sharing: time spent in the four phases of the work-sharing process. 

Get- Work: time spent in obtaining a new alternative from a shared node. It 
includes backtracking to that node and locking to ensure mutual exclusion. 
Cut: includes the time spent in the execution of a cut in the shared region and 
the time to move to another node whenever a cut operation takes place. 

Table E] shows the percentage of the execution time spent in each activity for 
two benchmark programs, one in the high-parallelism class and the other in the 
medium-parallelism class. The percentages were taken for a number of workers 
ranging between 1 and 8 workers. 

The results show that when the number of workers increases the percentage of 
the execution time spent on the Search and Sharing activities have the biggest 
increase. This happens because of the increased competition for finding work 
which makes workers get smaller tasks and consequently increase the frequency 
by which they have to search for new work. The increased competition also 
makes for unexploited alternatives to be situated more and more in the shared 
part of the tree which increases the Get- Work activity. 

The percentage of the execution time spent executing the cut predicate is 
minimal, indicating minimal overheads when executed in parallel. 
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Table 3. Workers activities during execution. 





Number of Workers 


Activity 


1 


2 


4 


6 


7 


8 


puzzle 














Prolog 


100.00 


99.95 


99.56 


99.20 


99.02 


98.68 


Search 


0.00 


0.02 


0.16 


0.32 


0.41 


0.60 


Sharing 


0.00 


0.02 


0.17 


0.32 


0.38 


0.50 


Get-Work 


0.00 


0.01 


0.10 


0.17 


0.19 


0.23 


Cut 


0.00 


0.00 


0.00 


0.00 


0.00 


0.00 


sm 














Prolog 


100.00 


97.68 


86.71 


74.56 


69.08 


63.29 


Search 


0.00 


0.81 


5.02 


11.50 


13.85 


16.87 


Sharing 


0.00 


0.86 


5.64 


10.17 


13.14 


15.76 


Get-Work 


0.00 


0.61 


2.51 


3.52 


3.61 


3.88 


Cut 


0.00 


0.04 


0.13 


0.25 


0.32 


0.20 



One possible explanation for the decrease on the amount of parallelism is 
shown in table S] This table shows the average number of tasks executed by one 
worker and the average size of a task. The size of a task may be defined as the 
average number of goals (that is Call instructions) executed within that task. 



Table 4. Average number of tasks and call instructions per task. 





Number of Workers 




1 


2 


4 


6 


7 


8 


puzzle 

Tasks 


1 


213 


1780 


2763 


3095 


3813 


Calls per task 


171024 


803 


96 


62 


55 


45 


sm 














Tasks 


1 


53 


188 


267 


308 


362 


Calls per task 


7965 


150 


42 


30 


26 


22 



The table clearly shows that increasing the number of workers decreases the 
granularity of the available parallelism. The consequence of this is that workers 
run out of work more quickly and therefore the activities related to work search, 
work sharing and getting work will become more important in the execution, 
causing overheads to increase. It is therefore no surprise that the performance 
for benchmark programs such as salt-mustard degrades significantly for larger 
numbers of workers. 

5 Conclusions 

We have presented YapOr, an or-parallel Prolog system based on the environ- 
ment copying model. The system has good sequential and parallel performance 
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on a large set of benchmark programs. It was able to achieve excellent speedups 
for applications with coarse-grained parallelism and it performs better than Muse 
for the applications with medium parallelism. The good performance was also 
explained by the fact that for most benchmarks YapOr spends its time mainly 
executing reductions and not managing parallelism. 

Recently, we have extended Yap to execute tabled logic programs. We are 
now working on adjusting the YapOr system to support parallel execution of 
such tabled logic programs |T2ri^ . 
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Abstract. Implicants and implicates has proven to be powerful tools to 
improve the deduction capabilities of automated theorem provers. In this 
work, we focussed on the propositional temporal logic and we propose 
a new theoretical framework to capture maximum information about 
implicants and implicates. Concretely, we study the structure of the sets 
of unitary implicates and implicants and present the concept of base as 
the smallest finite set that generates them. As we shall show, using bases 
it is possible to handle efficiently the sets of implicants and implicates. 
For this, we introduce a set of operators having linear cost. 

Key Words: Temporal Logic, Theorem Proving, Theory of Computa- 
tion. 



1 Introduction 

In recent years, several fully automatic methods for verifying temporal specifica- 
tions have been introduced, in El a tableaux calculus is treated at length, a first 
introduction to the tableaux method for temporal logic can be seen in |13| and 
in 1^ a temporal resolution is presented. However, the scope of these methods 
is still very limited. Theorem proving procedures for temporal logics have been 
traditionally based on syntactic manipulations of the formula A to be proven, 
but in general, they do not incorporate the substitution of subformulae in A, un- 
like a rewrite system, in which the rewrite relation preserves satisfiability. One 
source of interest of these strategies is that they can be easily incorporated into 
any prover, specifically into those which are non-clausal. 

Our group has developed a new framework for building automated theorem 
provers, named TAS. TAS is a non clausal method which uses an efficient deter- 
mination and manipulation of sets of unitary implicants and implicates of the 
input formula to avoid branching (see [1|,|H],|B],[S|)- 

The computation of unitary implicants and implicates of non-clausal formulae 
is required in several applications m, m- The sets of unitary implicates and 
implicants of a formula can be infinite and therefore difficult to handle. In this 
work we study the structure of the sets of unitary implicates and implicants 
and present the concept of base as the smallest finite set that generates them. 
Working with bases instead of the sets of implicates and implicants, we can treat 
them more efficiently. For this, we introduce a set of operators having linear cost. 



P. Barahona and J.J. Alferes (Eds.): EPIA’99, LNAI 1695, pp. 193- I20fl 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 
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On the other side, nowadays, only a few papers deal with past and future 
temporal formulae. The others use separated formulae [^, which allows us to be 
concentrated only in the future (or past) fragment. Nevertheless, this requires 
to transform the input formula into a particular separated normal form. An 
automatizable method to do this transformation is presented in [l] . In this work, 
we focussed on the propositional linear time temporal logic with past and future 
connectives. It supplies a framework to improve separation methods which may 
use the bases of implicants and implicates to design equivalent transformations 
having linear cost. These transformations are incorporable to the method itself 
and, therefore, they can reduce the length of the formulae at the same time they 
are separated. 

2 The FNextih Logic 

In this paper, our object language is the Temporal Propositional Logic, FNexti, 
with an infinite, linear and discrete flow of time, and connectives -i (negation), 
A (conjunction), V (disjunction), — >■ (material implication), F (sometime in the 
future), G (always in the future), © (tomorrow), P (sometime in the past), H 
(always in the past), 0 (yesterday) and the symbols T (truth) and T (falsity). 
V denotes the set of propositional variables p,q,r, . . . (possibly with subscripts) 
which is assumed to be completely ordered lexicographically, e.g., p„ < Qm for 
all n, m and Pn < Pm if and only if n < m. 

Definition 1. The well-formed formulae ( wffs ) are generated by the construc- 
tion rules of classical propositional logic together with the following rule: if A is 
a well-formed formula ( wff ), ©A, FA, GA, QA, PA and HA are wffs . 

Definition 2 (Hintikka Structure). A temporal structure is a tuple S = 
ifL,<,h), where 1 is the set of integers, < is the standard ordering on h, and h 
is a temporal interpretation which is a function h : FNextF — > 2^ satisfying: 

1. h{T)=Z; /i(T) = 0; h{-^A) = Z - h{A); h{Ay B) = h{A)VJ h{B); 
h{A AB) = h{A) n h{B); h{A -A B) = {Z - h{A)) U h{B). 

2. t G h{(BA) if and only if we have t + 1 € h{A) 

3. t G h(F A) if and only if there exists t' such that t < t' and t' G h(A) 

4- t G h(GA) if and only if for all t' with t < t' we have t' G h(A) 

5. t G h{oA) if and only if we have t — 1 G h{A) 

6. t G h{PA) if and only if there exists t' such that t > t' and t' G h{A) 

7. t G h{HA) if and only if for all t' with t > t' we have t' G h{A) 

A formula A is said to be satisfiable if there exists a temporal structure 
S = (Z, <, h) such that h{A) 0 and, in this case, if t G h{A), then h is said 
to be a model of A in t; if h{A) = Z, then A is said to be true in the temporal 
structure S, and we denote it by |=s A; if A is true in every temporal structure, 
then A is said to be valid, and we denote it |= A. Finally, = denotes the semantic 
equality, i.e., A = B if and only if for every temporal structure S = (Z, <, h) we 
have h{A) = h{B). 
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3 Temporal Literals 

In this section, we introduce the set of literals and a partial order over it. 

Definition 3 . We define a binary relation in FNext± as follows: 

If A and B are wffs , A < B if and only if \= A ^ B 

Notice that < is not an order relation in FNexti, although the relation 
induced by < on the set FNext ± /=, also denoted <, is a partial order. 

Definition 4 . Given p C V, the wffs p and —'p are the classical literals on p. 
In the rest of the paper, £p will denote a classical literal on p and will he the 
set of classical literals. 

Let FNextFF’°'^ be the set of wffs which do not have any binary connective, 
i.e., wffs of the form A = 7i...7nf'p with 7^ S {F,G,®,P,H,Q,-t} for all 
1 < i < n and £p G V^. <) is not a partially ordered set, but 

{FNext <) is. The following laws allow us to select a canonical form 

for each class. 

- —'—'A = A. 

- “I 0 A = 0—iA, —>F A = G~'A, —>GA = F—'A, 0 Gl = 0 —i 7 l, —'PA = H—'A, 

~^HA = P^A. 

- FFA = F®A; GGA = G ® A; PPA = P e A; HHA = H Q A. 

- HFA = FqA; PGA = GqA-, GPA = P ® A; FHA = H ® A. 

- 00 y 4 = 00^ = yl; HGA = GHA; PFA = FPA 

- If 7 G {F, G, P, H} then 07A = 7 0 A and 07^1 = 'y Q A. 

- If 7 G {FG, GF, PH, HP, FP, GH} then 7 0 A = 7A, 7 0 A = 7A, 

F'yA = 'yA; G^A = jA; P^A = ^A and H'jA = ^A. 

The elements of FNext will be named temporal literals and are 

defined as follows: 

Definition 5 . Given a classical literal £p G the set 0/ temporal literals 0 
on Ip, denoted Lit{£p), are those wffs of the set: 

Lit{£p) = {T, T} U {FG£p, GF£p, PH£p, HP£p, FP£p, GH£p} 

G{qHp, F £p, G £p, P £p, HQ^£p\kGZ} 



where Q)^£p stands for: 0 .^. (Bip if k > 0 , £p if k = 0 and © .*. Q£p if k < 0 . 

Example 1.- In this example, we display the canonical form of the wff A = 
-■P0GF0 0-.G-.0p. 

^ As we will be concerned only with temporal literals, in the rest of the paper we will 
drop the adjective temporal. 
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© GF © ©-.G-. Qp = HFGG © © © ©-.p = FG © ©-.p = FG-.p 

The pair is a poset but it is not a lattice because there exist 

pairs of literals which do not have supremum or infimum. For example, the set 
of upper bounds of {£p, (Bip} has two minimal elements, P ©^ ip and F © ip. 
The ordered set (Fit(p),<) is depicted in figure [B However, if we only notice 
the future fragment, 

Lit+{ip) = {T,T}U{FG£p,GF£p}U{©%,F©'=£p,G©'=£p | fc e N} 

we have that (ip) , <) is a lattice. This lattice is showed in figure E] 

Definition 6. Let £ G Lit{ip). We define its upward and downward closures as 
i t= \i' G Lit{ip) \ i ^ i'} and i f = {£' G Lit{ip) \ i' < i} respeetively. 

Besides this, if P Q Lit{ip) then we define P f = U^er ^ t r f= i J, 

Let {£i, . . . , £„} C Lit{ip), the only conjunction or disjunction of literals such 
that Vr=i ^ Lit (ip) or ^ Lit (ip) 0 are obtained applying, iteratively, 

the following equivalences: 

1. If ii < £2 then ii A £2 = i\ and iiM £2 = £ 2 - 

2. ForallfceZ, G©'=£pA©% = G©'=-i^p and F©'=^pV©'=£p = F©'=-i^p, 

3. ForallfceZ, iJ0''^pA©'=£p = iL©'=+i£p and PQ^ ipV Q^ip = PQ^+'^ ip. 

4. liki > k 2 then HQ'^HpAGQ^Hp = GHip and PQ^HpVFQ^Hp = FPip. 

For example, we can use the equivalences [2] and |4] to obtain: 

G ©^ p A ©®p A ©^p A ©^p AiL©'‘p = G©^pAiJ©'‘p = GHp 
The following lemma will be used in the rest of paper. 

Lemma 1. Let ip G , F C Lit{ip) a finite set and io G Lit{£p). 

1- 1= ^ A if O'lT'd only if 

there exists P' C F such that i G Lit{ip) and £q G i) t- 

2. \= io ^ y i^r ^ */ */ 

there exists F' C F such that \/ i^pi i G Lit{ip) and io G {\/ i^pi i) i. 

Proof. We only show the proof of item 1. Item 2 is proved by duality. 

If \= Afer ^ 0 , then condition is, obviously, sufficient. We distinguish two 
cases to prove the necessity of it: 

(i) io G Lit{ip) - ({G ip, HQ^ ip\kGl.}Gi {GHip}) 

(ii) io G [G ©'^ ip, HQ'^ip\kGZ}U {GHip} 

^ As a consequence of the definition of Lit{£p), G Lit{£p) means that there 

exists t! G Lit{ip) such that VjLiF = id in FNextF (similarly for Af^iii G Lit(ip)). 
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Fig. 2. Lattice Lit'^{£p) 
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(i) In this case, we prove that there exists £ € F such that £ < £q- We may 
consider the following cases: 

(t.l) £o = T {i.2) £o = ± (t.3) £o G {©% \ k e Z} 

[iA) 4 = FG£p (i.5) 4 = GF£p (t.6) £q € {F £p \ k € Z} 

(t.7) 4 = PH£p (i.8) £o = HP£p (i.9) £g e {F £p j k e Z} 

If = T it is obvius because ^ < T for all £ G F. If £g = -L, then ^ = -L 

and, therefore, _L G -T; which implies £q G Ff. 

We only prove (iA) (the other cases can be proved similarly): suppose £q = 
FG£p. If there not exists £ G F such that £ < £q, 

Fa{{GQ^ £p\kGZ}A {GH£p, _L}) = 0 

Furthermore, from the finiteness of F, there exists ko such that 

— if F[ £p G F then k < ko 

— if Q^£p G F then k < ko 

Therefore, for all t G Z, any interpretation ft, : V — >■ 2 such that 
h(£p) = (— oo, t + fco] U {2n | n G Z} 

satisfies t G ft(A^gr ^ ^ h(£o) = h(FG£p). So, ^ A^er ^ contrary 

to the hypothesis. Therefore, there exists £ G F such that £ < £q, in this case. 

(ii) In this case, we prove that there exists F' C F such that /\i^pi £ = £' and 
£' f^£o- We may consider the following cases: 

(UA) £o G {G £p\k gZ} (ii.2) £o G {H £p\ k gZ} 

(ii.3) £o = GH£p 

We only prove (ii.3) (the other cases can be proved similarly): if £o = GF££p 
then one of the following conditions is fulfilled: 

— GH£p G T or T G T 

— there exist ki and ^2 such that FI £p, G ©^^ £p G F and ki > k 2 - In 

this case H ©^^ £p f\G ©^^ £p = GH£p and GH£p < £q 

— there exist ki and ^2 such that Ff ©^^ £p, G £p G F, k\ < and 
Q^£p G F for all fti < ft < ^ 2 - In this case 

fe2 

H ©'^i £pf\ f\ e’"£p A G ©'^^ £p = GH£p and GH£p < £o 

k—k\ 

We suppose that this conditions are not true. There exists ko G Z such that: 

— ©'=«A ^ r. 

— If G £p G F then k > ko- 

— If F[ £p G F then k < ko- 

Now, given any t G Z, all interpretation ft such that h(£p) = Z — {t + ko} 
satisfies t G h(/\^^p£). But t ^ h(£o) = h(GFl£p) and, consequently, ^ 
AeeP^ ^0 contrary to the hypothesis. 

We conclude this section with the following definition: 

Definition 7. Let B he a wff, a literal £ is an implicate of B if \= B ^ £ and 
it is an implicant if \= £ ^ B. 
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4 Closed Sets of Literals, Bases, and zl-Lists 

Definition 8. A non-empty set E C Lit{ip) is said to be a-closed if it contains 
all the literals which are implicates of the conjunction of any two elements of S; 
i.e., for each £±,£2 € S and £ € Lit{£p), if ^ {£\ A £ 2 ) — )• £, we have £ & E. 

Dually, a non-empty set E C Lit{£p) is said to be /3-closecjl if it contains 
all the literals which are implicants of the disjunction of any two elements of E; 
i.e., for each £\,£2 S E and £ € Lit{£p), if \= £ ^ {£i \/ £ 2 ), we have £ G E. 

The following theorem, jointly with lemma [H expresses the good behaviour 
of the a and /3-closed sets. 

Theorem 1. Let E C Lit{£p) be a non-empty set of literals, then 

1. E is a-closed if and only if 

a) E t= E and 

b) for all £,£' G E such that £ !\£' G Lit(£p) we have £ !\£' G E. 

2. E is a-closed if and only if 

a) E E and 

b) for all £,£' G E such that £\/ £' G Lit{£p) we have £\/ £' G E 

Proof. We only show the proof for item 1 (item 2 can be proved similarly). First 
we prove the necessary condition: Let E C Lit{£p) be an a-closed set. Then: 

(a) By definition, E C E f. Moreover, \i £ G E f there exists £\ G E such that 
\= £\ ^ £. Therefore \= {£\ f\£\) ^ £ and, by hypothesis, £ G E. 

(b) If £i ,£2 G E and £i/\£2 = £, then \= (€1 A £ 2 ) ~^ £ and, by hypothesis, £ G E. 

Conversely, let E C Lit{£p) satisfying conditions (a) and (6). If £\, £2 G E and 
\= {£\ A £2) — t £ 0 , then lemma [U ensures either £q G E \ or £\ /\ £2 = £ G Lit{£p) 
and £ < £q. Now, from the hypothesis (a) and (6), we have £q G E. 

Corollary 1. The only a-closed and (3-closed set of literals is Lit{£p). 

Our next objective is to characterize the minimum a-closed set (resp. /3-closed 
set) which contains a given set of literals. 

Definition 9. Given E C Lit(£n), we define the a-closure of E, denoted by 
(r)“ as follows: 

{E)°‘ = {£q G Lit{£p)\there exists a finite set T C E such that |= Ag^r£ — t -^o} 
Dually, we define the /3-closure of E, denoted by (E)l^, as follows: 

(E)^ = {£q G Lit{£p)\there exists a finite set P C E such that \= £q ^ 

Example 2.- 

1. (0)« = {T}, ({T})“ = {T}, ({T})“ = Lit{£p) 

2. ({£})“ = £t 

We use the terms a-closed and /3-closed in honour of Smullyan. 
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3. If£i =4 e then ({4, -^ 2 })“ = 4 t 



Theorem 2. Let E C Lit{ip) then {E)°^ = f^{Ei \ Si is a-closed and E C Ei} 
and {E)^ = C\{Ei \ Ei is ^-closed and E C Ei} 

Example 3.- Let F = {Op,p,Gp} be a set of literals, then 

(r)“ = {FQ’^p\keZ}u{PQ'^p\k>o}u {q'^p | fc > -i} 

U {G p I fc > -2} U {T, FGp, GFp, FPp} 

This example is illustrated in figure E]a. 




\ A 

GHp 



a) Closure of F 



b) a-generating sets for E 



Fig. 3. 

Example 4.- Let F = {H ©^ p, G 0^ p} be a set of literals, then 
{Fr = GHpt= Lit{p)^{±} 

The previous theorem leads directly to the following result: 

Corollary 2. E Q Lit{£p) is a-elosed (resp. [3-elosed) if and only if S = {E)°^ 
(resp. S = {S)f^). 
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4.1 a-Bases and /3-Bases 

Our objective is to obtain an efficient management of the a-closed and /3-closed 
sets, so the next step is to look for the minimal sets that generate them. 

Definition 10. Let S C Lit{ip) be an a-elosed (respeetively fd-elosed) set, we 
say that a set F C E is a-generating (resp. fi-generating) for E if and only if 
(T)“ = r (resp. {F)l^ = E). 

Example 5.- Fi = {H ©^p,G0^p} and E 2 = {GFFp} are a-generating sets for 
E = Lit{p) \ {-L}. These sets are depicted in figure [21b. 

Definition 11. Let E C Lit{£p) be an a-closed (respectively (3-closed) set. We 
say that F C Lit{ip) is an a-base (resp. (3-base) for E if the two following 
conditions are fulfilled: 

— F is a non-empty a-generating (resp. /3-generating) set for E, 

— all the elements of F are minimal (resp. maximal) elements of E in the 
lattice {Lit{£p), <). 

From now on, if there is no any ambiguity, we will use the term base instead 
of a-base or /3-base. 

Example 6.- F 2 = {GHp} is an a-base for E = Lit(p) \ {_L} (see figure El . 

Lemma 2. Given E C Lit{£p) a-closed (respectively (3-closed) and % ^ F Q 
Lit{ip), F is a base for E if and only if the two following conditions are fulfilled: 

— F is a generating set for E. 

— £ f\l' ^ Lit{£p) (resp. £\I £' ^ Lit{£p) ), for all £, £' G F . 

As a direct consequence of this lemma, we have the following lemmas which 
characterize the contents of the bases: 

Lemma 3. Let E C Lit{£p) be an a-closed (or (3-closed) set and let F be a base 
for E. Then 

— If there exists £ € {_L, T, GH£p, FP£p} fl F then F = {£}. 

— F contains at most one element of the set 

{FG£p, GF£p} U {F £p \ k G Z} U {G q'" £p \ k G Z} 
and at most one element of the set 

{PH£p, HP£p} £p \ k gZ}U{H £p \ k gZ} 

— If F £p G F or G £p G F, any other literal £p G F fulfills k' < k 

— If P £p G F or H £p G F, any other literal £p G F fulfills k' > k 

Lemma 4. Let E C Lit{£p) and F C E. 
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1. If E is a-closed and F is a base for S , then: 
a) //Ter thenF = E = {T}. 

h) Let £p e then the following conditions are fulfilled: 

- IfHQ’^ £p, G £pG r then k' > k + 2 

— If G £p £ r, any other literal £p £ F fulfills k' < k 

— If H £p £ F, any other literal £p £ F fulfills k' > k. 

2. Dually, if E is (3-closed and F is a base for E, then: 

a) If E£F then F = E = {E}. 

b) Let £p £ then the following conditions are fulfilled: 

— If F £p £ F, any other literal £p £ F fulfills k' < k 

— If P £p £ F, any other literal £p £ F fulfills k' > k 

- IfPQ’^ £p, F £p£F then k' >k + 2 

The following lemma illustrates the importance of the bases. 

Lemma 5. Let E C Lit{£p) be a-closed and let F be a base for E, then 
(r)“ = Ff = E. Dually, let E C Lit{£p) be (i-closed and let F be a base for E, 
then {F)l^ = Ff= E 

The following theorem ensures the uniqueness of the bases. 

Theorem 3. Let E be a-closed (respectively (3-closed). If F is a base for E, 
then F is the only base for E. Moreover, if F is finite, then any other non- 
empty generating set for E, F' , satisfies |T| < |r'| Q 

Proof. Let E C Lit{£p) be a-closed and Fi and F 2 two different bases for E 
then, there exists £ £ Lit{£p) — {T} such that £ £ F\ and £ ^ / 2 - 

Since F 2 is an a-base for E, lemma |5] ensures that there must exist £' £ F 2 
such that £fi^£' and £ £ £' f. Therefore, £ is not a minimal element of E and Fi 
is not a base for E. 

Now, suppose that the second assertion of this theorem is not true; then there 
exists F' (r F') generating for E such that |T'| < |r|. 

So, we can define a list of a-generating sets for E, Fq, Fi, F 2 , Fn, ■■■ , such 
that /q = F' and |r| > |/o| > |ri| > I/ 2 I > • • • > |/n| > ■ . ■ as follows: 

From the uniqueness of the bases, F^ is an a-generating set for E, but it is not 
a base. Therefore, lemma [2| ensures that there must exist two literals £,£' £ Fi 
{£ ^ /'), such that £ f\£' = £" for some other literal £" . Now, we define 

T,+i = (r, -{/,/'})©{/"} 

The list Fq, Fi, F 2 , . . . , Fn, . . . must be finite. Furthermore, there exist £q € 
Lit{£p) and m G N such that Fm = {^o}- Therefore Fm is a base, contrary to 
the hypothesis. The proof for the /3-bases is analogous. 

Lemma[2]leads to the definition of two normalization operators which trans- 
form any finite generating set for a given a-closed or /3-closed set into its corre- 
sponding base. 



4 



|F| denotes the cardinality of F. 
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Definition 12. Let F C Lit{£p) be a finite set, the 0-normalizor operator, de- 
noted by Afo, performs the following transformations: 

- ATo(0) = {T} 

— If £,£' £ r and £ A £' = £" £ Lit{£p), then it substitutes £ and £' by £" 

Let r C Lit{£p) be a finite set, the 1-normalizor operator, denoted by M\, per- 
forms the following transformations: 

- ATi(0) = {T} 

— If £,£' £ r and £y £' = £!' £ Lit{£p), then it substitutes £ and £' by £" 

Example 7.- Let F = {F (Bp, G (B^p, ©^, FI ©^p}. The operator A/q yields 
the following a-base for (T)“: Afo{F) = {G (B^ p,(B^, H (B^p} and the operator 
Afi yields the following /?-base for (T)^: Afi{F) = {F ©^p, ©p} 



4.2 Union and Intersection of Closed Sets 

As a direct consequence of the definition of closed sets, we have: 

Lemma 6. Let S,F' C Lit{£p). Then, if E and E' are a-elosed (respectively 
fi-closed) then E D E' is a-closed (resp. (3-closed). 

The following example shows that this property is not true for the union. 
Example 8.- Let E = ©p t and E' = G©p t These sets are a-closed, but EUE' 
is not a-closed, because ©p, G(Bp £ EUE', ©p A G © p = Gp and Gp ^ EUE'. 

For any two given a-closed (resp. /3-closed) sets, E and E', we are interested 
in the sets EUE' and {E U E')^ (resp. EUE' and {E U E')^). More concretely, 
we want to characterize the bases for these two sets. To this end, we define new 
binary operators over finite sets of literals: 

Definition 13. Let F,F' C Lit{£p) be two finite sets of literals, then we define 
F [0) F' and F UJ F' , named 0-union and 1-union of F and F' , as follows: 

F \ol F' Afo{F U F'); F [U F' Afi{F U F') 

Now, we introduce the following lemma, whose proof is immediate. 

Lemma 7. Let E, E' C Lit{£p) a-closed (resp. (3-closed) and F,F' the bases for 
E and E' respectively, then TUT' (resp. F UJ F' ) is the base for (A U A')“ 
(resp. {E U E')l^ ). 

These two union operators have a linear cost when they are applied to ordered 
bases. The order of the bases is given in the following definition: 

Definition 14. Let E C Lit{£p) be a base. The A^-Ust associated with F is 
built by ordering its elements following these criteria: 

— The first element is the only element in the set 

Lit{£p) \ ({FG£p, GF£p} U {G ©* £p, F ©^ £p, (B^£p \ k £ Z}) if it exists. 
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— The last element is the only element in the set 

Lit{ep) \ {{PH£p, HP£p} U {H £p, P ©'^ £p, Q'^£p \ k € Z}) if it exists. 

— The other elements (all having the shape Q^£pi) are increasing ordered fol- 
lowing the k index. 

The following example illustrates the application of the union operator iQl: 
Example 9 .- Let T and P' be two bases whose associated Z\^’-lists are: 

A = [HQ^p, Qp, ©^p, ©®p, F(b'^p] and A' = [P©^p, Qp, ©^p, G©^p] respectively. 
To get T loJ T' with linear cost, we traverse these lists as follows: 

— Since iL ©^ p < P ©^ p, then P ©^ p is not an element of P iQl P'. 

— Since G©'*pA©"‘p = G©^p, then G©'*p and ©'^p do not belong to P ISJ P', 
but G ©^ p does. 

— Since G ©^ p < ©^p, then ©®p is not an element of P iQl P'. 

— Since G (B'^ p < F p, then P ©^ p is not an element of P ISJ P'. 

— The other elements of A and A' belong to P ISJ P'. 

The final result is: P loj P' = {H ©^ p, Qp, ©^p, G ©^ p}. 

Definition 15. Given £\,£2 G Lit{£p), Sup{£i,£2) denotes the a-base for 
£\ t t Inf{£\,£2) denotes the ( 3 -base for £\ f ©£2 i- 

Lemma 8. Let £ 1,^2 S Lit{£p). 

1 . £ G Sup{£i, £2) if o,nd only if the literal £ is a minimal element of the set of 
upper bounds o/{£i,£2}. 

2 . £ G Inf{£i,£2) if o,nd only if the literal £ is a maximal element of the set of 
lower bounds o/{£i,£2}. 

Proof. We prove only item 1 by detailing all the possible situations (see figure | 1 ]): 

i. If £1 A £2 G Lit{£p) then /„/(£i,£2) = {£1 A £2} 

ii. In other case: 

a) If £1 < HP£p and £2 < GFip then /„/(£i,£2) = {GH£p} 

b) If £ < HP£p then /„/(£, ©'=£p) = {H ©'=+1 £p} 

c) If £ < GF£p then /„/(£, ©'=£p) = {G q’^~^ £p} 

d) If k2 > ki then Inf{&'"^£p,Q'"^£p) = {H ©''2+1 £p,G©'"i“^ £p} 

e) If /c2 > ki then Inf {P £p, (B^^£p) = {H ©'=2+1 q q*i -2 g,nd 
Iuf{&'^^£p, F ©'== £p) = {H ©^==+2 £p, G ©'= 1-1 £p] 

f) If /C2 > fci - 1 then 4 /(P ©''i £p, F ©^=2 £^) = {H ©^=2+2 Q £p} 

g) lik2 <ki-l then Inf {P ©^^ £p, F ©*^2 £^) = {©'=£p | fc2 < A: < fci} 



Definition 16. Let F, F' C Lit{£p) be two finite sets, we define P fol P' and 
F (i\ F' , named 0-intersection and 1-intersection of F and F' respectively, as 
follows: 

F fo) P' ='a£o( U Sup{£,£')); F fh P' =Vi( |J /„/(£,£')) 

e^r fer 

£'er' iQr' 
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(a) 




(e) 




(b) 




(e) 




(c) 




(f) 




(d) 




(g) 



Fig. 4. Cases for Inf{ii,i 2 ) 



Theorem 4 . Let C Lit{£p) a-closed (resp. ( 3 -closed) and r,F' the bases 

for E and E' respeetively, then F (d\ F' (resp. F | 7 | F') is the base for E fl E' . 

Proof. Since E and E' are a-closed sets, we have E = F f and E' = F' t- 
Consequently, E E' = F f C^E' f= (U^er^ t) n t) = U^Gr.^'Gr'(^ t 

n£' t), which is a-closed. 

Since for all £ £ T and £' G F' the base for (. f n£' f is Sup{(,£'), lemma |3 

ensures that E E' = t,^r Sup{£,£') \ = T fol T' 

V J 

Remark L The operators ISJ, UJ, fol and fil are commutative and associative (which 
allows us to write A^). Moreover, we have: 

(z) {±} ioi F = {±} = {±} ( 1 ) F 

(ii) {T} UJ r = {T} = {T} fol T 

(m) {_L} fol r = F = {T} y T 

(iv) {T}fTir= F ={_L}UJT 

The structure of the bases (lemmas [ 3 ] and | 4 j allows us to ensure that the two 
intersection operators (fol and fill) have a linear cost when they are applied to 
ordered bases. The following example illustrates this: 

Example 10 .- Let Fi and F2 be two bases whose associated Z\^’-lists are 

Ai = [P 0 ^ p, Op, (Bp, G p] and A2 = [p, ®p, F ©^ p] respectively. We 
compute Inf {(-,£') for all £ G Ai,^' G A2: 

— For all £1 G Ai and £2 G A2, we define 

/■/£i, £2) = /n/(£i, £2) \ {{FGp, GFp} U {F ©%, G 0 % I A: G Z}) 

Notice that if there exist £1 and £2 (£1 yf £2) such that one of the following 
conditions are fullfiled: 
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Inf{Pe‘^P,p) 
Iuf{PQ^P,®p) 
/n/(Pe^J3,E©2 p) 



{|//©p|, G p} 
{W©"p,G©^ p} 
{H ©'* p, G ©^ p} 



Inf{ep,p) 

inf{ep,®p) 

7„/(©p,F©2p) 



{H®p,Ge^p} 

{ 77 ©^p,G©^p} 

{H(B^p,Ge^p} 



Infi®P,p) 
Inf {(Bp, (Bp) 

7n/(©P,E©2 p) 



{H®^p,Gep} 

{[^} 

{H (B^p,Gp} 



I„f{G®^p,p) 
InfiG ©^ p, (Bp) 
7n/(G©^p,F©2p 



{Gep} 

{Gp} 

{|G© 2 p|} 



Fig. 5. Computation of Inf{I,£') 



— ii is subsequent to P p in Ai 

— £2 is subsequent to p in A2 

then for all £ G I~f{£i,£2) we have £ < H (Bp and the normalizor Afi removes 
it from the final result. 

- On the other side, for all £\ G Ai and £2 € A2, we define 

I^f{£i,£ 2 )=Inf{£i,£ 2 ) \ {{PHp,HPp}U{P(B’^p,H(B^p \kGZ}) 

Notice that if there exist £i and £2 (£1 ^ £2) such that one of the following 
conditions are fullfiled: 

— £1 is preceding to G (B^ p in Ai 

— £2 is preceding to F p in A2 

then for all £ G I^f{£i,£2) we have £ < G(B^p and the normalizor A/i removes 
it from the final result. 

Finally, the elements of Ai fi| A2 are: 

- ©p (because ©p G Ai 0 A2). 

- H (Bp (because H (Bp G I~f{P ©^ p,p))- 

- G ©^ p (because G ©^ p G ©^ p, F ©^ p)). 

These elements are framed in figure 0 The computation of the other literals of 
the figure is superfluous. The final result is: Ai fh A2 = {FI © p, ©p, G ©^ p}. 



4.3 Sets of Implicates and Implicants 

Our underlying idea is to develop a set of reduction strategies which, through 
the efficient determination and manipulation of subsets of Lit{£p) which contain 
unitary implicates and implicants of any given wff A, denoted Iq’’(A) and 
X\”{A) respectively, analyse the possibility of reducing the size of A. Xq”(A) and 
X^^{A) satisfy the following property: 

Lemma 9. Let A and B be wffs , then: 

Xq^ (A) n Xq^ {b) = Xg” {A V B) x[^ (A) n (p) = x[^ {A a b) 

(A) U Iq'’ (P))“ C iG y jG g jG 
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Since these sets are a-closed and /3-closed, respectively, although they can be 
infinite, they can be treated efficiently using their corresponding bases and the 
union and intersection operators introduced in definitions [T^ and 1161 
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Abstract. We introduce a resource adaptive agent mechanism which 
supports the user of an interactive theorem proving system. The mecha- 
nism, an extension of [S|, uses a two layered architecture of agent societies 
to suggest applicable commands together with appropriate command ar- 
gument instantiations. Experiments with this approach show that its 
effectiveness can be further improved by introducing a resource concept. 
In this paper we provide an abstract view on the overall mechanism, 
motivate the necessity of an appropriate resource concept and discuss its 
realization within the agent architecture. 



1 Introduction 

Interactive theorem provers have been developed to overcome the shortcomings 
of purely automatic systems and are typically applied in demanding domains 
where fully automated techniques usually fail. Interaction is needed, for example, 
to speculate lemmata or to guide and control the reasoning, for instance, by 
providing the crucial steps in a complicated proof attempt. 

Typical tactic-based interactive theorem proving systems such as Hol [TSl , 
Tps [^, or our own flMEGA offer expressive problem formulation and com- 
munication languages and employ human oriented calculi (e.g., a higher-order 
natural deduction or sequent calculus) in order to keep both proof and proof 
construction comprehensible. 

Initially, problems are given as a theorem together with a set of axioms. 
Proofs are then constructed by successive application of tactics which are either 
rules from the given calculus or little procedures that apply sequences of such 
rules (cf. IH] ) . Generally, the user can employ a tactic by invoking an associated 
command. Tactics can be applied forward to axioms and derived facts or back- 
ward to open problems which may result in one or several new open problems. 
A proof is complete when no open subgoal remains, i.e. when the originally gi- 
ven theorem is successfully justified by a derivation from the given axioms. In 
most systems the user can easily combine existing tactics in order to build new, 
possibly more abstract ones. Moreover, some systems offer the use of external 
reasoning components such as automated theorem provers or computer algebra 
systems in order to enhance their reasoning power. 



P. Barahona and J.J. Alferes (Eds.): EPIA’99, LNAI 1695, pp. 208- 12211 1999. 
(c) Springer- Verlag Berlin Heidelberg 1999 



Critical Agents Supporting Interactive Theorem Proving 209 



The number of tactics (and therefore the number of commands) offered to the 
user by an interactive theorem prover is often quite large. Thus, it is important to 
support the user (especially the non-expert user) in selecting the right command 
together with appropriate instantiations for its parameters (e.g., proof lines, 
terms, or sub-term positions) in each proof step. 

Although suggestion mechanisms are already provided in state of the art 
interactive theorem provers, they are still rather limited in their functionality as 
they usually 

(i) use inflexible sequential computation strategies, 

(ii) do not have anytime character, 

(hi) do not work steadily and autonomously in the background of a system, and 
(iv) do not exhaustively use available computation resources. 

In order to overcome these limitations we proposed in [S] a new, flexible 
support mechanism with anytime character. It suggests commands, applicable 
in the current proof state — more precisely commands that invoke applicable 
tactics — together with suitable argument instantiation^. It is based on two 
layers of societies of autonomous, concurrent agents which steadily work in the 
background of the system and dynamically update their computational behavior 
to the state of the proof and/or specific user queries to the suggestion mecha- 
nism. By exchanging relevant results via blackboards the agents cooperatively 
accumulate useful command suggestions which can then be heuristically sorted 
and presented to the user. 

A first implementation of the support mechanism in the flMEGA-system yiel- 
ded promising results. However, experience showed that the number of agents 
can become quite large and that some agents perform very costly computations, 
such that the initial gain of the distributed architecture and the use of concur- 
rency is easily outweighed by the mechanism’s computational costs. In a first 
step to overcome this dilemma we developed a resource adapted concept for the 
agents in order to allow for efficient suggestions even in large examples. However, 
the concurrent nature of the mechanism provides a good basis to switch from a 
static to a dynamic, resource adaptive control of the mechanism’s computational 
behavioiEl. Thereby, we can exploit both knowledge on the prior performance of 
the mechanism as well as knowledge on classifying the current proof state and 
single agents in order to distribute resources. 

After giving an example in the next section, to which we will refer throughout 
this paper, we review in Sec. Elour two layered agent mechanism as introduced 

^ Whereas in [5] and in this paper the suggestion mechanism is described with respect 
to tactical theorem proving based on a ND-calculus m, we want to point out that 
our mechanism is in no way restricted to a specific logic or calculus, and can easily 
be adapted to other interactive theorem proving contexts as well. 

^ In this paper we adopt the notions of resource adapted and resource adaptive as 
defined in |21| . where the former notion means that agents behave with respect to 
some initially set resource distribution. According to the latter concept agents have 
an explicit notion of resources themselves, enabling them to actively participate in 
the dynamic allocation of resources. 
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in [^. In Sec.[4|we present a static resource concept to enhance the mechanism. 
This concept is then extended in Sec. into a resource adaptive one, where the 
resource allocations are dynamic and based on the following criteria: 

1. The lower layer agents monitor their own contributions and performance in 
the past in order to estimate the fruitfulness of their future computations. 

2. The resource allocations of the societies of lower layer agents is dynamically 
monitored and adjusted on the upper layer. 

3. A classification agent gathers explicit knowledge about the current proof 
state (e.g., which theory or which logic the current subgoal belongs to) and 
passes this information to the lower layer agents. 

Hence, the agents in our mechanism have a means to decide whether or not 
they should pursue their own intentions in a given proof state. Their decision is 
based on sub-symbolic ((T| and E]) as well as on on symbolic information ([2]) . We 
finally conclude by discussing what a state of the art interactive theorem prover 
can gain from employing the proposed suggestion mechanism and by hinting at 
possible future work. 



2 Reference Example 

In the remainder of this paper we will frequently refer to the proof of the higher 
order (HO) theorem (po->o (oo A bo)) => (p (& A a)), where o denotes the type of 
truth values. Informally this example states: If the truth value of oA6 is element 
of the set p of truth values, then the value of & A a is also in p. Alternatively one 
can read the problem as follows: Whenever a unary logical operator p maps the 
value of Oo A bo to true, then this also holds for the value of 6 A a. Although, 
the theorem looks quite simple at a first glance this little higher-order (HO) 
problem cannot be solved by most automatic HO theorem provers known to the 
authors, since it requires the application of the extensionality principles which 
are generally not built-in in HO theorem proving systems. However, within the 
OMEGA-system |5] this problem can easily be proven partially interactively and 
automatically. 

Omega employs a variant of Gentzen’s natural deduction calculus (ND) [12] 
enriched by more powerful proof tactics and the possibility to delegate reasonably 
simple sub-problems to automated theorem provers. Thus, the following proof 



for the example theorem can be constructecEI 




Li 


(Li) b (p (a A 6)) 


Hyp 


Li 


(Li) h (6 A a) (a A 6) 


Otter 


L 3 


(Li) h {b A a) = {a Ab) 


4^2= : {Li) 


L 2 


(Li) h (p (bAa)) 


=.ubst: ((1 ))(LiL3) 


C 


0 h (p (aAb)) ^ (p (bAa)) 


=>!'■ {L 2 ) 



® Linearized ND proofs are presented as described in [T]. Each proof line consists of a 
label, a set of hypotheses, the formula and a justification. 
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The idea of the proof is to show that the truth value of a A 6 equals that of 
b Aa (lines L3 and L4) and then to employ equality substitution (line ^2)- The 
equation (6 A o) = (a A b) is derived by application of boolean extensionality 
from the equivalence {b A a) {a A b) in line L3, whereas the rewriting step in 
line L2 is indicated in the line’s justification which reads as ‘substitute the sub- 
term at position (1) in line Li according to the equation stated in line L3’ where 
position ( 1 ) corresponds to the first argument of the predicate p. The equivalence 
(b A a) 4 ^ (a A b) in line L3, can either be proven interactively or, as in the given 
proof, justified by the application of the first-order prover Otter m which 
hides the more detailed subproof. 

Our agent mechanism is able to suggest all the single proof steps together 
with the respective parameter instantiations to the user. In particular, for the 
proof of line L4 it suggests the choice between the application of Otter or 
the next interactive prove step to the user. In case of the latter choice, the 
mechanism’s further suggestions can be used to finish the proof completely in- 
teractively. 

In the remainder of this paper we use the proof of the presented example 
in this section to demonstrate the working scheme of the suggestion mechanism 
and to motivate the incorporation of resource concepts. 



3 Suggesting Commands 

The general suggestion mechanism is based on a two layered agent architecture 
displayed in Fig. [ 1 ] which shows the actual situation after the first proof step in 
the example, where the backward application of introduces the line Li as new 
hypothesis and the line L2 as the new open goal. The task of the bottom layer 
of agents (cf. the lower part of Fig. [Tj is to compute possible argument instan- 
tiations for the provers commands in dependence of the dynamically changing 
partial proof tree. The task of the top layer (cf. the upper part of Fig. [I]) is to 
collect the most appropriate suggestions from the bottom layer, to heuristically 
sort them and to present them to the user. 

The bottom layer consists of societies of argument agents where each society 
belongs to exactly one command associated with a proof tactic (a more formal 
notion of proof tactic is introduced in Sec. E3D. On the one hand each argument 
agent has its own intention, namely to search in the partial proof for a proof 
line that suits a particular specification. On the other hand argument agents 
belonging to the same society also pursue a common goal, e.g., to cooperati- 
vely compute most complete argument suggestions (cf. the concept of partial 
argument instantiations in Sec. Id.ip for their associated command. Therefore 
the single agents of a society exchange their particular results via a suggestion 
blackboard and try to complete each others suggestions. 

The top layer consists of a single society of command agents which steadily 
monitor the particular suggestion blackboards on the bottom layer. For each 
suggestion blackboard there exists one command agent whose intention is to 



212 C. Benzmuller and V. Sorge 




Fig. 1. The two layered suggestion mechanism. 

determine the most complete suggestions and to put them on the command 
blackboard. 

The whole distributed agent mechanism runs always in the background of 
the interactive theorem proving environment thereby constantly producing com- 
mand suggestions that are dynamically adjusted to the current proof state. At 
any time the suggestions on the command blackboard are monitored by an in- 
terface component which presents them heuristically sorted to the user via a 
graphical user interface. As soon as the user executes a command the partial 
proof is updated and simultaneously the suggestion and command blackboards 
are reinitialized. 

3.1 Partial Argument Instantiations 

The data that is exchanged within the blackboard architecture heavily depends 
on a concept called a partial argument instantiation of a command. In order to 
clarify our mechanism we need to introduce this concept in detail. 

In an interactive theorem prover such as IImega one has generally one com- 
mand associated with each proof tactic that invokes the application of this tactic 
to a set of proof lines. In Hmega these tactics have a fixed outline, i.e. a set 
of premise lines, conclusion lines and additional parameters, such as terms or 
term-positions. Thus the general instance of a tactic 'T can be formalized in the 
following way: 

' ' 'r '^(Qi • ■ ■ Q") 

1 
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where we call the Pi^Cj,Qk the formal arguments of the tactic T (we give an 
example below). 

We can now denote the command t invoking tactic T formally in a similar 
fashion as 



Pn'" Pii, 
^31 ' ' ' 



1 



where the formal arguments pi,Cj,qk of t correspond to a subset of the formal 
arguments of the tactic. To successfully execute the command some, not neces- 
sarily all, formal arguments have to be instantiated with actual arguments, e.g., 
proof lines. A set of pairs relating each formal argument of the command to 
an (possibly empty) actual argument is called a partial argument instantiation 
(PAI). 

We illustrate the idea of a PAI using the tactic for equality substitution =subst 
and its corresponding command =Subst as an example. 



X = y 



= Subst{P*) 



— ^ =Subst{pl) 



Here 'l>[x\ is an arbitrary higher order formula with at least one occurrence of 
the term x, P* is a list of term-positions representing one or several occurrences 
of X in and 'P[y\ represents the term resulting from replacing cc by y at all 
positions P* in <P. u, eq, s and pi are the corresponding formal arguments of 
the command associated with the respective formal arguments of the tactic. We 
observe the application of this tactic to line L2 of our example: 



Li (i-i) h {p{aAb)) Hyp 

L2 (ii) b (p(foAa)) Open 



One possible PAI for =Subst is the set of pairs {u:Li,eq:e, s:L2,pl:e), where 
e denotes the empty or unspecified actual argument. We omit writing pairs 
containing e and, for instance, write the second possible PAI of the above example 
as {u:Li, s:L2,pl'-{{l))). To execute =Subst with the former PAI the user would 
have to at least provide the position list, whereas using the latter PAI results in 
the line L3 of the example containing the equation. 



3.2 Argument Agents 

The idea underlying our mechanism to suggest commands is to compute PAIs 
as complete as possible for each command, thereby gaining knowledge on which 
tactics can be applied combined with which argument instantiations in a given 
proof state. 

The main work is done by the societies of cooperating Argument Agents 
at the bottom layer (cf. Fig. [ 1 ]). Their job is to retrieve information from the 
current proof state either by searching for proof lines which comply with the 
agents specification or by computing some additional parameter (e.g., a list of 
sub-term positions) with already given information. Sticking to our example we 
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can informally specify the agents and 21^^ for the =Subst 

command (cf. [S] for a formal specification): 

{ find an open line u and a support line s that differ 1 
only wrt. occurrences of a single proper sub-term / 

{find a support line eq which is an equation} 

{ find a support line eq which is an equation} 
suitable for rewriting u into s / 

{compute the positions where s and u differ} 

The attached superscripts specify the formal arguments of the command for 
which actual arguments are computed, whereas the indices denote sets of formal 
arguments that necessarily have to be already present in some PAI, so that the 
agent can carry out its own computations. For example agent 21^^ only starts 
working when it detects a PAI on the blackboard where actual arguments for u 
and s have been instantiated. On the contrary does not need any additional 
knowledge in order to pursue its task to retrieve an open line containing an 
equation as formula. 

The agents themselves are realized as autonomous processes that concur- 
rently compute their suggestions and are triggered by the PAIs on the black- 
board, i.e. the results of other agents of their society. For instance both agents, 
21^® and 21^^ would simultaneously start their search as soon as 2lg’“ has 
returned a result. The agents of one society cooperate in the sense that they 
activate each other (by writing new PAIs to the blackboard) and furthermore 
complete each others suggestions. 

Conflicts between agents do not arise, as agents that add actual parameters 
to some PAI always write a new copy of the particular PAI on the blackboard, 
thereby keeping the original less complete PAI intact. The agents themselves 
watch their suggestion blackboard (both PAI entries and additional messages) 
and running agents terminate as soon as the associated suggestion blackboard 
is reinitialized, e.g., when a command has been executed by the user. 

The left hand side of Fig. [T] illustrates our above example: The topmost 
suggestion blackboard contains the two PAIs: {u:Li, s:L 2 ) computed by agent 
2 I 0 ’® and {u:Li,s:L 2 ,pl'.{{l))) completed by agent 21^^ 

In the current implementation argument agents are declaratively specified. 
This strongly eases modification and enhancement of already given argument 
agents as well as the addition of new ones, even at run time. 



21 “’* = 
= 

21^9 

- 

2iP' _ 



3.3 Command Agents 

In the society of command agents every agent is linked to a command and its task 
is to initialize and monitor the associated suggestion blackboards. Its intention is 
to select among the entries of the associated blackboard the most complete and 
appropriate PAI and to pass it, enriched with the corresponding command name 
to the command blackboard. That is, as soon as a PAI is written to the related 
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blackboard that has at least one actual argument instantiated, the command 
agent suggests the command as applicable in the current proof state, providing 
also the PAI as possible argument instantiations. It then updates this suggestion, 
whenever a better PAI has been computed. In this context better generally means 
a PAI containing more actual arguments. In the case of our example the current 
PAI suggested by command agent €=subst is {u:Li,s:L2,pl.{{l))). 

These suggestions are accumulated on a command blackboard, that simply 
stores all suggested commands together with the proposed PAI, continuously 
handles updates of the latter, sorts and resorts the single suggestions and provi- 
des a means to propose them to the user. In the case of the flMEGA-system this 
is achieved in a special command suggestion window within the graphical user 
interface CilUI [Hj. The sorting of the suggestions is done according to several 
heuristic criteria, one of which is that commands with fully instantiated PAIs 
are always preferred as their application may conclude a whole subproof. 

3.4 Experiences 

Unfortunately, computations of single agents themselves can be very costly. Re- 
consider the agents of command =Subst: In Umega we have currently 25 diffe- 
rent argument agents defined for =Subst where some are computationally highly 
expensive. For example, while the agent only tests head symbols of formu- 
las during its search for lines containing an equation and is therefore relatively 
inexpensive, the agent Stg’* performs computationally expensive matching ope- 
rations. In large proofs agents of the latter type might not only take a long time 
before returning any useful result, but also will absorb a fair amount of system 
resources, thereby slowing down the computations of other argument agents. 

already tackles this problem partially by introducing a focusing technique 
that explicitly partitions a partial proof into subproblems in order to guide the 
search of the agents. This focusing technique takes two important aspects into 
account : 

(i) A partial proof often contains several open subgoals and humans usually 
focus on one such subgoal before switching to the next. 

(ii) Hypotheses and derived lines belonging to an open subgoal are chronolo- 
gically sorted where the interest focuses on the more recently introduced 
lines. 

Hence, the agents restrict their search to the actual subgoal {actual focus) and 
guide their search according to the chronological order of the proof lines. 

4 Resource Adapted Approach 

Since agents are implemented as independent threads a user can interrupt the 
suggestion process by choosing a command at any time without waiting for all 
possible suggestions to be made. An agent then either quits its computations 
regularly or as soon as it detects that the blackboard it works for has been 
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reinitialized, when the user has executed a command. It then performs all fur- 
ther computations with respect to the reinitialized blackboard. However, with 
increasing size of proofs some agents never have the chance to write meaningful 
suggestions to a blackboard. Therefore, these agents should be excluded from 
the suggestion process altogether, especially if their computations are very costly 
and deprives other agents of resources. 

For this purpose we developed a concept of static complexity ratings where 
a rating is attached to each argument and each command agent, that roughly 
reflects the computational complexity involved for its suggestions. A global com- 
plexity value can then be adjusted by the user, permitting to suppress compu- 
tations of agents whose ratings are larger than the specified value. Furthermore, 
commands can be completely excluded from the suggestion process. For exam- 
ple, the agent has a higher complexity rating than from the =Subst 
example, since recursively matching terms is generally a harder task than retrie- 
ving a line containing an equation. The overall rating of a command agent is set 
to the average rating of its single argument agents. 

Although this rating system already increases the effectiveness of the com- 
mand suggestions, it is very inflexible as ratings are assigned by the programmer 
of a particular agent only. It is neither designed nor intended for being adjusted 
by the user at runtime as steadily controlling, e.g., more than 500 agents (this 
amount is easily reached by an interactive prover with only 50 tactics and an 
average of 10 agents per associated command) would rather divert the user’s 
attention from his/her main intention, namely interactively proving theorems. 
Anyway, since choosing an appropriate complexity rating depends on run-time 
and computational performance, i.e. happens on a sub-symbolic level, the user 
should be as far as possible spared from this kind of fine-tuning of the mecha- 
nism. 

5 Resource Adaptive Approach 

In this section we extend the resource adapted approach into a resource adap- 
tive one. While we retain the principle of activation/deactivation by comparing 
the particular complexity ratings of the argument agents with the overall deac- 
tivation threshold, we now allow the individual complexity ratings of argument 
agents to be dynamically adjusted by the system itself. Furthermore, we intro- 
duce a special classification agent which analyzes and classifies the current proof 
goal in order to deactivate those agents which are not appropriate with respect 
to the current goal. 



5.1 Dynamic Adjustment of Ratings 

The dynamic adjustment takes place on both layers: On the bottom layer we 
allow the argument agents to adjust their own ratings by reflecting their perfor- 
mance and contributions in the past. On the other hand the command agents 
on the top layer adjust the ratings of their associated argument agents. This is 
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motivated by the fact that on this layer it is possible to compare the performance 
and contribution of agent societies of the bottom layer. 

Therefore, agents need an explicit concept of resources which enables them 
to communicate and reason about their performance. The communication is a- 
chieved by propagating resource informations from the bottom to the top layer 
and vice versa via the blackboards. The actual information is gathered by the 
agents on the bottom layer of the architecture. Currently the argument agents 
evaluate their effectiveness with respect to the following two measures: 

1. the absolute cpu time the agents consume, and 

2. ‘the patience of the user’, before executing the next command. 

(P is an objective measure that is computed by each agent at runtime. Agents 
then use these values to compute the average cpu time for the last n runs and 
convert the result into a corresponding complexity rating. 

Measure © is rather subjective and it expresses formally the ability of an 
agent to judge whether it ever makes contributions for the command suggesting 
process in the current proof state. Whenever an agent returns from a computa- 
tion without any new contribution to the suggestion blackboard, or even worse, 
whenever an agent does not return before the user executes another command 
(which reinitializes the blackboards), the agent receives a penalty that increases 
its complexity rating. Consequently, when an agent fails to contribute several 
times in a row, its complexity rating quickly exceeds the deactivation threshold 
and the agent retires. 

Whenever an argument agent updates its complexity rating this adjustment 
is reported to the corresponding command agent via a blackboard entry. The 
command agent collects all these entries, computes the average complexity ra- 
ting of its argument agents, and reports the complete resource information on its 
society of argument agents to the command blackboard. The command black- 
board therefore steadily provides information on the effectiveness of all active 
argument agents, as well as information on the retired agents and an estimation 
of the overall effectiveness of every argument agent society. 

An additional resource agent uses this resource information in order to reason 
about a possibly optimal resource adjustment for the overall system, taking the 
following criteria into account: 

— Assessment of absolute cpu times. 

— A minimum number of argument agents should always be active. If the 
number of active agents drops below this value the global complexity value 
is readjusted in order to reactivate some of the retired agents. 

— Agent societies with a very high average complexity rating and many retired 
argument agents should get a new chance to improve their effectiveness. 
Therefore the complexity ratings of the retired agents is lowered beneath 
the deactivation threshold. 

— In special proof states some command agents (together with their argument 
agents) are excluded. For example, if a focused subproblem is a propositio- 
nal logic problem, commands invoking tactics dealing with quantifiers are 
useless. This aspect is further elaborated in Sec. 15.21 
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Results from the resource agent are propagated down in the agent society 
and gain precedence over the local resource adjustments of the single agents. 



5.2 Informed Activation & Deactivation 

Most tactics in an interactive theorem prover are implicitly associated with a 
specific logic (e.g., propositional, first-order, or higher -order logic) or even with a 
specific mathematical theory (e.g., natural numbers, set theory). This obviously 
also holds for the proof problems examined in a mathematical context. Some sy- 
stems - for instance the flMEGA-System - do even explicitly maintain respective 
knowledge by administering all rules, tactics, etc., as well as all proof problems 
within a hierarchically structured theory database. This kind of classification 
knowledge can fruitfully be employed by our agent mechanism to activate ap- 
propriate agents and especially to deactivate inappropriate ones. Even if a given 
proof problem cannot be associated with a very restrictive class (e.g., propo- 
sitional logic) from the start, some of the subproblems subsequently generated 
during the proof probably can. This can be illustrated with our example: The 
original proof problem belonging to higher-order logic gets transformed by the 
backward application of =>/, and = 2 = into a very simple propositional 

logic problem (cf. line L 4 ). In this situation agents associated with a command 
from first- or higher-order logic (like =Subst, VE, or Lec 0) should be disabled, 
whereas other agents could use this information in order to suggest the applica- 
tion of an automatic theorem prover that can efficiently deal with propositional 
logic. In the case of our example the mechanism would subsequently suggest to 
apply Otter on the remaining problem. 

Therefore, we add a classification agent to our suggestion mechanism whose 
only task is to investigate each new subgoal in order to classify it with respect to 
the known theories or logics. As soon as this agent is able to associate the current 
goal with a known class or theory it places an appropriate entry on the command 
blackboard (cf. ”HO” entry in Fig. [T]). This entry is then broadcasted to the 
lower layer suggestion blackboards by the command agents where it becomes 
available to all argument agents. Each argument agent can now compare its own 
classification knowledge with the particular entry on the suggestion blackboard 
and decide whether it should perform further computations within the current 
proof state or not. 

The motivation for designing the subgoal classifying component as an agent 
itself is clear: It can be very costly to examine whether a given subgoal belongs 
to a specific theory or logic. Therefore this task should be performed concur- 
rently by the suggestion mechanism and not within each initialization phase of 
the blackboard mechanism. Whereas our current architecture provides one single 
classification agent only, the single algorithms and tests employed by this compo- 
nent can generally be further distributed by using a whole society of classification 
agents. 



Leo is a higher-order theorem prover integrated in flMEGA [4]. 
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By appropriately extending the message passing and communication facilities 
of the agents, respectively the blackboards, it will even be possible to pass control 
knowledge (e.g., abstract knowledge on the recent proof attempt as a whole or on 
the users preferences) from Omega’s conceptual proof planning layer successively 
to the parameter and command agent layer. 

6 Conclusion and Future Work 

In this paper we reported on the extension of the concurrent command suggestion 
mechanism to a resource adaptive approach. The resources that influence the 
performance of our system are: 

(i) The available computation time and memory space. 

(ii) Classification knowledge on the single agents and the agent societies. 

(iii) Criteria and algorithms available to the classification agent. 

Our approach can be considered as an instance of a boundedly rational sy- 
stem [2TI20I . The work is also related to [13] which presents an abstract resource 
concept for multi-layered agent architectures. [T^ describes a successful applica- 
tion of this framework within the Robocup simulation. Consequently some future 
work should include a closer comparison of our mechanism with this work. 

The idea to use parallel processing within automated deduction dates back 
to the seventies |18| . Recent work is mainly on frameworks for concurrent or 
cooperating automated theorem provers mm- These frameworks generally 
involve only a very limited number of single reasoning agents. However, there 
are already some attempts to introduce full-scale multi-agent technology within 
proof planning m- 

In contrast, our work is, at least initially, not designed for realizing agent- 
based automated theorem proving but for assisting the user in interactive theo- 
rem proving. Our suggestion mechanism enhances traditional user support with 
the following features: 

Flexibility The user can choose freely among several given suggestions and 
can even communicate with the system about parameter instantiations (by 
pre-specifying particular instantiations as constraints for the mechanism). 
Anytime character At any given point the command blackboard contains the 
heuristically best-rated suggestions with respect to state of the parameter 
agents’ computations. 

Robustness Single faulty agent specifications have only a minor influence on 
the quality of the overall suggestion mechanism and, in contrast to a tra- 
ditional sequential mechanisms, semi-decidable specification criteria can be 
employed. 

Expandability The sketched mechanism is not restricted to rules and tactics 
and can be applied to arbitrary commands (e.g., to support an intelligent 
flag-setting for external/internal reasoners with respect to the current proof 
state). 
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User adaptability Expert-users may define their own suggestions agents. It 
should be possible to extend the approach such that it takes particular user 
preferences for certain commands (types of proofs) into account. 

The presented extensions are currently implemented and analyzed in flMEGA. 
This might yield further possible refinements of the resource concepts to improve 
the performance of the mechanism. Another question in this context is, whether 
learning techniques can support our resource adjustments on the top layer, as 
it seems to be reasonable that there even exist appropriate resource patterns 
for the argument agents in dependence of the focused subproblem. We speculate 
that the presented mechanism can be further extended to form an intelligent, 
resource- and user-adaptive partner for the user in an interactive theorem prover. 

Another application we are currently investigating is the use of the agent me- 
chanism within a proof planning scenario as introduced in |Z] . Since the approach 
is not restricted to specify applicability conditions for rules and tactics only, it 
can analogously be employed for proof methods as well. So far the specification 
of the argument agents for rules and tactics describes structural properties of 
single arguments as well as structural dependencies between the different argu- 
ments. Similarly we can specify agents which check, probably guided by available 
control knowledge, the particular pre-conditions of a proof method, i.e. check for 
proof lines matching with those required for a method to be applicable or ve- 
rify additional application conditions of the method. Thus, we speculate that 
to some extend concurrency can be exploited even within a traditional proof 
planner when computing the applicable methods with respect to the given proof 
goal by cooperating argument agents in each proof step [6]. 
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Abstract. This paper investigates simple well behaved syntactic methods to 
fuse prioritized knowledge bases which are semantically meaningful in the 
frameworks of possibility theory and of Spohn’s ordinal conditional functions. 
Different types of scales for priorities are discussed: finite vs. infinite, 
numerical vs. ordinal. Syntactic fusion is envisaged here as a process which 
combines prioritized knowledge bases into a new prioritized knowledge base, 
and thus allows for subsequent iteration. Several fusion operations are 
proposed, according to whether or not the sources are dependent, or conflicting, 
or sharing the same scale. 

Keywords: Knowledge representation, Knowledge fusion, possibility theory. 



1. Introduction 

The fusion of (possibly inconsistent) prioritized knowledge bases from multiple 
sources is a key issue in modern information management and in database merging 
[1; 7; 15]. Applications of fusion will be critical in many domains like electronic 
commerce, group decision making support software, team-based software 
development, etc. 

In this paper we show how possibility theory provides an expressive framework for 
fusing information from multiple sources. Typically, knowledge bases are graded 
(prioritized) according to the reliability of the information they contain. In 
possibilistic logic, a prioritized knowledge base is a set of weighted formulas of the 
form ((|), a) where (|) is a classical propositional formula and a is a weight which 
belongs to [0,1], and is understood to be a lower bound of a necessity measure. This 
weight accounts for the level of certainty (or the priority) of the information 
represented by the logical formula. Each possibilistic knowledge base induces a 
complete pre-order on interpretations that can be encoded by means of possibility 
distributions [11], or equivalently Spohn’s kappa functions [18; 19]. The 
possibilistic setting does not necessarily require a numerical scale such as [0,1] but 
can also be used with finite linearly ordered scales. For notational simplicity, in the 
following we use a numerical encoding of the ordering in [0,1]. 

At the semantic level, fusing possibility distributions can be easily achieved if the 
commensurability assumption is made, namely when the sources share the same 
meaning of the scale [0,1]. In this paper, many fusion modes are defined, according 
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to whether or not the sources are dependent, or conflicting, or commensurate. 
Syntactic fusion modes, which directly operate on knowledge bases, should agree 
with semantic ones, previously developed (e.g., [10]), in order to be meaningful. 
Clearly, the advantage of such an approach is to produce a new knowledge base as a 
result, which allows for the subsequent iteration of the fusion process. An efficient 
way to do this is proposed in this paper, which is more generally devoted to 
semantically meaningful syntactic fusing methods. 

Belief revision [12] can be viewed as a fusing process where the new incoming 
information has priority over the existing accumulated information in a priori 
knowledge base. However, fusion differs from revision since it is basically 
considered as a symmetric operation, namely it does not necessarily distinguish the 
new information. Both fusion and revision involve inconsistency handling. 

The next section restates the necessary background on possibility theory (which can 
be defined on any finite or infinite linearly ordered scale) and on Spohn s kappa 
functions (usually defined with the integer scale), and shows the connection between 
these two frameworks. Section 3 provides a general tool for the syntactic fusion of 
both possibilistic and kappa function-based prioritized knowledge bases. 



2. Semantics of prioritized knowledge bases 

Let be a finite propositional language. ^ denotes the classical consequence 
relation, Greek letters represent formulas. Q. is the set of classical interpretations, and 
[(|)] the set of classical models of (|). 

Often epistemic states, viewed as a set of knowledge about the real world (based on 
the available information), are represented semantically by either a total pre-order on 
Q., or on the set of formulas. The latter is called an epistemic entrenchment relation 
[12]. These orderings reflect the strength of the various knowledge maintained by an 
agent. A priority ordering over knowledge can be encoded using different types of 
scale: a finite linearly ordered scale, the integers (possibly completed by -i-oo), the 
unit interval [0,1], etc. According to the scale, the strength of knowledge ranges 
from a purely ordinal notion to a numerical quantity. 

2.1. Possibility theory 

Semantic representation of epistemic states 

In a possibility theory framework, at the semantic level, an epistemic state is 
represented by a possibility distribution 7t which is a mapping from Q. to the 
interval [0,1]. 7t((0) represents the degree of compatibility of co with the available 
information (or beliefs) about the real world. By convention, 7t((O)=0 means that the 
interpretation co is impossible, and 7t(co)=l means that nothing prevents co from 
being the real world. When 7t(co)>7t(co’),CO is a preferred candidate to co’ for being 
the real state of the world. A possibility distribution 7t is said to be normal if 
3coeil, such that 7t(co)=l, namely there exists at least one interpretation which is 
consistent with all the available beliefs. 

Given a possibility distribution K, we can define two different ways to rank-order 
formulas of the language from this possibility distribution. This is obtained using 
two mappings grading respectively the possibility and the certainty of a formula (|): 
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— the possibility (or consistency) degreen((|)) = max{7t(co) : (Oe [(|)] } which 
evaluates the extent (|) is consistent with the available beliefs expressed by 7t [21]. It 
satisfies: 

V(|) V\|/ n((|)vi|/) = max(n((|)), n(i|/)); 

— the necessity (or certainty, entailment) degree N$) = 1 - n(— 1 (|)) which evaluates 
the extent (|) is entailed by the available beliefs. We have: 

V(^ V\|/ N((^a\(/) = min(N((^), N(\|/)). 

The duality equation N((|)) = 1 - n(— 1 (|)) extends the existing one in classical logic, 
where a formula is entailed from a set of classical formulas if and only if its 
negation is not consistent with this set. Clearly N((|))>0 implies n((|))=l which 
means that a formula before being entailed by the available beliefs should be 
consistent with them. 

Lastly, given a possibility distribution K, the semantic determination of the belief 
set (corresponding to the agent’s current beliefs) denoted by BS(7t), is obtained in the 
usual way in nonmonotonic reasoning. We denote by Pref(7t) the set of 
interpretations with maximal weights, namely: 

Pref(7t) = jco : there is no to’ such that7t(co’) >7t(co)|. 

Then: 

BS(7t)= {(^ : [(^] cPref(7t)}. 

When 7t is normal, then we can check that: 

BS(7t)= {(^ : N((^) >0}. 

Syntactic representation of epistemic states 

An epistemic state can also be represented syntactically by specifying explicitly the 
certainty degree of some formulas of the language, that we call a possibilistic 
knowledge base. A possibilistic knowledge base is made up of a finite set of 
weighted formulas 

L={((|)i, flj), i=l,nj 

where aj is understood as a lower bound on the degree of necessity N((|)j). Formulas 
with null degree are not explicitly represented in the knowledge base (only beliefs 
which are somewhat accepted by the agent are explicitly represented). The higher the 
weight, the more certain the formula. 

Definition 1: Let L be a possibilistic knowledge base, and ae [0,1]. We call the a- 
cut (resp. strict a-cut) of L, denoted by T.>a (resp. by E>a), the set of classical 
formulas in L having a certainty degree at least equal (resp. strictly greater than) a. 

A possibilistic knowledge base L is said be consistent if its classical knowledge 
base, obtained by forgetting the weights, is classically consistent. We denote by: 
Inc(Z)=Max[aj is inconsistent) 

the inconsistency degree of E. Inc(L) = 0 means that E>aj is consistent for all a/. 



Lastly, the syntactic computation of the belief set, denoted by BS(L), is obtained by 
considering the deductive closure of the set of classical formulas of L having 
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certainty degree higher than Inc(L), namely: 

BS(D = Cn{(|)j : (t|)j a^) e L and aj > Inc(L)}. 

Checking if a formula belongs to BS(L) can be done efficiently, with a complexity 
close to that of classical logic. 

From the syntactic to the semantic representation 

Given a possibilistic knowledge base L, we can generate a unique possibility 
distribution by associating to each interpretation, the level of compatibility with 
agent’s beliefs, i.e., withL. When a possibilistic knowledge base only consists of 
one formula {((|), a)], then each interpretation (O which satisfies (|) will have the 
possibility degree 7t(co) = 1 since it is consistent with (|), and each interpretation (O 
which falsifies (|) will have a possibility degree 7t((0) such that the higher a is (i.e., 
the more certain (|) is), the lower 7t((0) is. In particular, if a=l (i.e., (|) is completely 
certain), then 7t((0) = 0, namely (O is impossible. One way to realize this constraint 
is to assign to 7t(co) the degree I - a with a numerical encoding. Therefore, the 
possibility distribution associated with L={(t|), a)} is: 

VcoelT, 7t{(^^)j(co) =1 ifcoe[(^] 

= 1 —a otherwise. 

When L = {((]);, «i), i=l,n} is a general possibilistic knowledge base then all the 
interpretations satisfying all the beliefs in L will have the highest possibility 
degree, namely 1, and the other interpretations will be ranked w.r.t. the highest 
belief that they falsify, namely we get [11]: 

Definition 2: The possibility distribution associated with a knowledge base L is 
defined by: 

Vcoe Q., 7t^(co) = 1 if V((|)j a^) e L, coe [(|)J 

= 1 — maxjaj : ((|)j aj) e L and cog [c|)j] } otherwise. 

Thus, 7t£ can be viewed as the result of the combination of the tt(((|)j qj)} s using the 
minimum operator, that is: 

7t^((0) = min ai)}(®) ^ «i) ^ ^ }■ 

Example 1: 

Let I={(q,.3),(qvr,.5)j. 

Then 

7t£(qr)=7t£(qT)=l; Tt£(V)=.7; 7t£('q'r)=.5. 

The two interpretations qr and qT are the preferred ones since they are the only ones 
which are consistent with L, and "qr is preferred to "q"r, since the highest belief 
falsified by 'qr (i.e., (q 3)) is less certain than the highest belief falsified by 'q'r 
(i.e., (qvr, .5)). 

The possibility distribution 7t^ is not necessarily normal, however Tt^ is normalized 
iff L is consistent. Moreover, it can be verified that: 



Inc(Z) = 1 - maXg, 7t£(co). 
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It can be easily verified that the syntactic computation of belief sets coincides with 
the semantic ones: 

BS(I) = BS(Tt^). 

It is important to notice that defining (|)>\|/ iff N^((|))>N£(i|/) is an epistemic 
entrenchment relation, where is the necessity measure associated with 7t^. 
Moreover, can be immediately computed from Z as follows: 

= max{fli : E>ai ^ (|)}. 

The following definition and lemmas are useful for the rest of the paper. 

Definition 3: Let ((|), a) be a belief in Then ((|), a) is said to be subsumed by 
I if: 

<2)} I — t|), 

and ((|), a) is said to be strictly subsumed by L if Z>q ^ (|). 

Lemma 1: Let ((|), a) be a subsumed belief of L. Then L and L’=L-{((|), a)} are 
equivalent, namely: 7t^ = 7t^’. 

All the proofs of propositions of this paper can be found in the technical report[5]. 
As a corollary of the previous lemma, we can add or remove subsumed beliefs 
without changing the possibility distribution. This means that several syntactically 
different possibilistic knowledge bases may have the same possibility distribution as 
a semantic counterpart. In such a case, it can be shown that their a-cuts, which are 
classical knowledge bases, are logically equivalent in the usual sense. The next 
lemma exhibits similar conclusions when we remove tautologies from knowledge 
bases: 

Lemma 2: Let (T, a) be a tautological belief of L- Then L and L’=L-{(T, a)} are 
equivalent, namely: 7t^ = 7t^’. 

2.2. Kappa-functions framework 

The semantic representation of epistemic states in the kappa functions framework 
[18] is basically the same as in possibility theory, except that rather than associating 
to each interpretation (O a degree between [0, 1], we associate it with an integer 
k((o). The lower k((o) is, the more preferred it is. k((0)=+oo means that (O is 
impossible, while k(co)=0 means that absolutely nothing prevents (O from being 
the real world, because it is consistent with the agents beliefs. When K((o)<K(a)’), 
CO is preferred to (o’.K is said to be normal if 3coe f2, such that k(co)=0. From a 
kappa distribution, a ranking can be defined on ^ (using classical notations): 

k((|)) = min |k(co) : toe [(|)] } and k(T)= +°o. (#) 

The syntactic representation in kappa-function framework, largely developed by 
Williams [20], differs slightly from that in the possibilistic logic framework. 
Williams starts with sets of integer-valued formulas K={((|)j, kj): kje [Nu{-l-oo} } 




A Practical Approach to Fusing Prioritized Knowledge Bases 227 



which are partial epistemic entrenchment rankings, namely those which satisfy V((|)i 
ki)eK: 

i) if ^ (|)i then ki=+°o (tautology). 

ii) {\|/j : kj>kj} (|)i (non redundancy). 

iii) if K is inconsistent, then V((|)i ki)e min(K), kj=0. 

where min(K) contains all the formulas with lowest rank in K. Intuitively, partial 
epistemic entrenchment rankings represent agent s explicit beliefs, where the higher 
the rank assigned to formulas in K the more firmly held they are. 

The first condition simply means that tautologies get the highest rank. The second 
condition means that ((|)i kj) should not be entailed by formulas of rank higher than 
kj and the last condition means that when K is inconsistent the formulas with 
lowest rank should be equal to 0, the rank of inconsistent formulas. 

Clearly, possibilistic knowledge bases are not necessarily partial epistemic 
entrenchment rankings. However if we remove tautologies and subsumed beliefs, 
which lead to an equivalent knowledge base due to lemmas 1 and 2, then the result 
satisfies i) and ii). Moreover if L is consistent then condition iii) is also satisfied. 
When L is inconsistent, beliefs with lowest weight are not necessarily assigned 0. 

Given a partial epistemic entrenchment ranking K, Wiliams [20] gives a way to 
extend it to a full epistemic entrenchment ranking. This is done by first defining: 

Exp(K) = ((^i ki)e K and ki>0), 

the set of explict beliefs of an agent. Exp(K) is always consistent. The full 
epistemic entrenchment associated with K is obtained by associating a unique rank 
to each belief (|) denoted by z((|)), and defined in the following way: 

z((^)=0 ifExp(K) (^, 

= max (kj : |\|/j: (tj/j ki)e K, kj>kj} ^ (|)} otherwise; 

and z(T)= -too. 

This direct approach for generating a full epistemic entrenchment is basically the 
same as the one in possibilistic logic. However usually in possibility theory, given 
a possibilistic knowledge base, a possibility distribution is generated instead of a 
necessity measure while in the kappa-function framework a full epistemic 
entrenchment is generated rather than a ranking on the set of interpretations. This is 
a matter of convenience, since each full epistemic entrenchment (or a necessity 
measure) uniquely determines a total pre-order on the set of interpretations (or a 
possibility distribution) and conversely. 

We can easily check that the function z can be equivalently recovered from the 
equation (#) and the following K^: 

Kk(co) = max {kj : (\|/i ki)e K and CO g [¥i]} (##) 

with max(0)=O. Indeed: 

Proposition 1: for each (|)AT , (|)A_L, Kk(c|))=z(' (|)). 
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Table 1 summarizes how to translate from kappa functions to possibility theory. It 
indicates that possibility transforms of kappa functions are valued on a particular 
subset of rationals in [0,1]. 



kappa-functions 


Possibility theory 


K(C0) 


7tK(to) = 


K((^) 


= 2-K((’) 


K 


£K=[((^il-2-'"i):((^iki)e K] 



Table 1: From kappa-functions to possibility theory 



Clearly, letting 7tK(w) = leads to : 

nK((|)) = max {7tK(0)) : (0 e [(])]} 

-K((0) 

= max (2 : CO e [(|)]} 

_ 2 — min{K(co), 

= 2^(‘l>) . 

The converse transformation is only possible when K(co)=-log2(7t(co)) takes its 
value in the set of integers. Clearly, the advantage of the scale [0,1] is its capacity to 
accommodate as many intermediary levels as is necessary for expressing the ranking 
between beliefs. 



3. Merging knowledge bases 

Several authors, briefly surveyed in Sub-section 3.4., have considered the merging of 
information coming from different parallel sources. The aim of fusion is to obtain a 
global point of view, by exploiting the complementarity between the sources, 
solving different existing conflicts, reducing imprecision and uncertainty, as well as 
removing redundancies if independent sources. 

This section addresses the problem of fusing ordered pieces of information, 
semantically and syntactically, in the framework of possibility theory (and also in 
the framework of kappa-functions using Table 1). 

In the following, we first describe the syntactic counterpart of the combination of 
two possibility distributions by an operator © which is very weakly constrained. 
Then, we discuss some particular cases of © which are of special interest. 

3.1. Fusing weighted bases with commensurability assumption 

Let Li and £2 be two possibilistic knowledge bases. Let 7ti and 7t2 be their 
associated possibility distributions given by Definition 2. Let © be a two place 
function whose domain is [0,l]x[0,l] to be used for aggregating 7ti(0)) and 7t2(w) 
into 7t©(co) for any to. This presupposes a commensurability assumption between 
the scales used by each source for expressing the strength of its beliefs. This is 
sanctioned by the use of [0,1] as a common scale. Then from Lj and £ 2 . we are 
interested in building a new possibilistic knowledge base £© such that 7t£0=7t0, 
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where 7t£0 is the possibility distribution associated to Z0 using Definition 2. 

We first analyse the general case where © is very weakly constrained. Then we 
discuss some interesting combination modes in the next section. The only 
requirements for © are the following properties: 

i) 1 © 1 =1; 

ii) if a>c, b>d then a © b > c © d (strengthening). 

The first one acknowledges the fact that if two sources agree that (O is fully 
possible, then the result of the combination should confirm it. The second property 
expresses that a possibility degree resulting from a combination cannot decrease if 
the combined degrees increase. Note that © is not necessarily symmetric. 

Let us first consider the syntactic counterpart of © (denoted by the same symbol for 
sake of clarity) when the combination is applied to two one-formula knowledge 
bases: 

Lemma 3: Let Lj={((|), a)} and Z 2 =KV> ^)) be two one-formula knowledge bases. 
Then : 

L0 = © Z2 ={(<>, l-(l-a)©!)} u {(i|/, l-l©(l-b))} u 

{((^vi|/, l-(l-a)©(l-b)) if(^vi|/AT}. 

We can check that 0 <= l-(l-a)©l <=a and l-(l-a)©(l-b)) > l-(l-a)©l. When 1-(1- 
a)©l >0 then the result is composed of three parts, the two original formulas and 
the common knowledge between the two sources, i.e., (|)vi|/ (if different from a 
tautology). 

Before generalizing Lemma 3, let us first introduce some definitions: 

Definition 4: Let Lj={((|)j, aj) : iel} and Z2={(¥j> ^j)' J}> with I={1, ,n} and 
J={1, ,m}. We call {( (|)iVi|/j f(ap bj)): iel and je J, and (|)iV\|/j aT} the common 
knowledge of Zjand E2 with f(ap bj) > min(aj, bj). 

Indeed, the common knowledge of a set of knowledge bases is the information that 
is entailed by all knowledge bases with the strength of belief never less than it is in 
any knowledge base. Moreover, when f(ap bj) = min(ap bj) the common knowledge 
is the larger set of information items that can be simultaneously entailed from all 
the knowledge bases. 

Definition 5: Let L={((|)p aj) : ie I}. We call {((|)j g(aj): ie 1} the discounted 
knowledge of L where g(aj)<ap 

Discounting L simply means reduce the weights associated with the beliefs in L 
from a to g(a)<a. 

Lemma 3 can be extended to general possibilistic knowledge bases. 
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Proposition 2: . Let Lj = {((|)j, aj) : iel} and E 2 =((Vj> ^j)' J}> 7 ti and 7 t 2 

their associated possibility distributions (using Definition 2). Then the knowledge 
base associated with 7 t 0 = 7ti©7t2, denoted by Z 0 , is composed of: 
the common knowledge of Ejand ^2 with a weight 
f(a;,bj)= l-(l-ai)©(l-bj), and 

the discounted knowledge bases of Ljand Z 2 with the weights respective: 
g(ai) = l-((l-ai )© 1 ) and g(bj) = l-l©(l-bj)). 

More formally, we have: 

L 0 = {((^i l-(l-ai )©1): iel} u{(Vj l-l©(l-bj)) :je J) 

u{((|)jvi|/j l-(l-aj)©(l-bj)): iel andje J, and (|)iV\|/j ^^Tj. 

Proposition 2 can be applied repeatedly in the case of n sources. If the sources play 
symmetric roles and the operation © has a natural extension to n arguments (e.g., © 
is associative) then a formal expression giving the final result in one step can be 
easily derived. Proposition 2 extends results of [3; 9] where only particular cases of 
© are considered. 

The following sub-sections discuss particular syntactic cases of the combination 
operator © which are semantically meaningful. The first one (idempotent 
conjunction) is meaningful when the sources are consistent and dependent, the 
second one (idempotent disjunction) is appropriate when the sources are highly 
conflicting, the third one deals with independent sources, the fourth one is the usual 
weighted average, and the last one deals with the situation when the 
commensurability assumption is not acceptable. 

Idempotent conjunction ; © = min 

The first combination mode that we consider is the idempotent conjunction (i.e., the 
minimum) of possibility distributions. Namely: 

Vco, 7 tij,(co) = min ( 7 tj(co), 7 t 2 (to)). (IC) 

Conjunctive aggregations make sense if all the sources are regarded as equally and 
fully reliable since values that are considered as impossible by one source but 
possible by all the others are rejected. To clarify this point of view, let us assume 
that the two sources only provide incontrovertible information represented by two 
classical formulas (|)j and <^ 2 - These two formulas induce two binary valued 
possibility distributions Ttj and 7 t 2 such that each Ttj partitions the set of classical 
interpretations into two subsets, namely Aj, the models of (|)p containing the 
completely possible interpretations (i.e., Vcoe Ap 7tj((0)=l) and ^ the counter- 
models of (|)p containing the completely impossible interpretations (i.e., Vcoefi — 
Aj, 7tj(co)=0). The result of the combination of Ttj using (IC) leads to partition 
into two subsets (AjnA 2 , D — (^nA 2 )), namely models of (|)j and (|) 2 , and 
counter-models (|)j or (|) 2 . The conjunction mode (IC) in this case is natural if 
AinA 2 is not empty (namely, if (|)j and (^2 consistent). 
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An important issue with (IC) is the fact that the result may be subnormalized, i.e., 
it may happen that there is noco such that 7tj(,((0)=l. In that case it expresses a 
conflict between the sources. Clearly the conjunctive mode makes sense if all the 
Ttj’s significantly overlap, i.e., 3(0, Vi, 7tj(co)=l, expressing that there is at least 
one value of (O that all sources consider completely possible. Besides, if two sources 
provide the same information 7t]^=7t2, the result is still the same distribution 
(idempotency). Letting ©=min in Prop. 2, (IC) simply leads to the union of Lj and 
^2 at the syntactic level, namely: 

^ic =^1 ^ ^ 2 - 

It can be verified by noticing that ((|)iVi|/j max(aj, bj)) is either subsumed by ((|)j aj) 
(if aj>bj) or by (\|/j bj) (if bj>aj). 

Example 2: 

Letli = {(q.3), (qvr .5)j,l2={(r .6)}, 

Using Definition 2, we have 

7ti(qr)=7t2(qh)=l; Tti(qr)=.7; 7ti('q'r)=.5. 

7t2(qr)=7t2('qr)=l; 7l2(q'r)= 7t2( q'r)=.4. 

Then: 

%(qr)=-7, 7tjc(q'r)=.4; 7tjc('q'r)=.4, 

and: 

lie = Uq-3), (qvr .5), (r .6)} = {(q .3), (r .6)} 

(due to Prop. 1 since (qvr, .5) is a subsumed belief). 

We can easily check that Ttj^, can be recovered from Lje using Definition 2. 



Conjunctions with reinforcing effect The min-based combination mode has 
no reinforcement capability. Namely, the result retains the smallest possibility 
degree for (O, that corresponds to the most informed sources. However if both 
experts consider co as rather impossible, and when these opinions are independent, 
then it may be reasonable to consider co as less possible than each source claims 
separately. More generally, if a set of independent experts is divided into two unequal 
groups that disagree, we may want to favor the opinion of the biggest group. This 
type of combination cannot be modelled by any idempotent operation. What is 
needed is a reinforcement effect which can be obtained using another operation. The 
most usual conjunctive ones are the product and the so-called "Lukasiewicz t-norm" : 

ttpro(®) = tti(03)X7t2(co) (Pro), 

7tLuk(®) = max(0, 7t^(co) -i-7t2(co)-l). (Luk) 

These combination modes are not idempotent, and they have two nice properties 
which are associativity and commutativity. Their syntactic counterpart are directly 
derived from Proposition 2: 



^ 1 - 2 ^ {((|)jV\|/j, ajH-bj-apbj): ie I, je J, and (|)iV\|/j aT }, 
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^Luk=^l ^2 {(<l^i'^¥j> min(l,ai+bj)):ie I, je J, and (|)iVi|/j }. 



Note that (Luk) can still be defined with a finite scale, but not (Pro). Moreover, the 
common knowledge (|)jV\|/j is more strongly believed with (Luk) than with (Pro). 

Example 2 (continued): 

We have: 

^Pro(qr)=l; Ttpro) qr)=.7; 7tpro(q'r)=.4; 7tpro( q'r)=.2, 

Ipro = Iiul 2 u{(qvr .72), (qvr .8)} 

= {(q .3), (qvr .8), (r .6)} (after removing subsumed formulas) 
and 

^Luk(qr)=l;ttLuk('qr)=-7; ^Luk(q'i‘)=-4; ^Luk( q'i‘)=0’ 

^Luk=^l'-^^ 2 u{(qvr .9), (qvr 1)} 

={(q .3), (qvrl), (r .6)}. 

Clearly, we have a reinforcement effect. Indeed, the common knowledge, i.e. the 
belief qvr, which is supported by the two sources, is entailed with a degree higher 
than if the idempotent conjunction (IC) is applied. 

Minimally weighted common knowledge: 

When the information provided by the sources is highly conflicting, then © = max 
(called here disjunctive aggregation) can be more appropriate than the previous 
conjunctive aggregation. Namely define: 

Vco, 7ti(j(co) = max (7ti(co), 7t2(<»)). (ID) 

The disjunctive aggregation corresponds to a weaker reliability hypothesis, namely, 
in the group of sources there is at least one reliable source for sure, but we do not 
know which one. 

When we only have two binary valued possibility distributions Ttj and 7t2 such that 
each Ttj partitions the set of classical interpretations into two subsets, namely Aj 
containing the completely possible interpretations (i.e., Vcoe Aj, 7tj((0)=l) and — 
Aj containing the completely impossible interpretations (i.e., Vcoefi — 
7tj(co)=0). The result of the combination of Ttj using (ID) leads to partition into 
two subsets (A]^uA 2 , ^ — (^^uA 2 )) respectively. 

The disjunction mode (ID) in this case is natural if AjnA 2 is empty. Besides, if 
two sources provide the same information 7tj=7t2, the result of the disjunctive 
combination is still the same distribution. And if the information provided by a 
source k is less specific than the information given by the other(s) then 

Letting ©=max in Prop. 2, the syntactic counterpart of (ID) is: 

Zid = {((|)iV\|/j min(apiij)): iel andje J and (|)iV\|/j }. 

Note that Zjd always consistent (provided that or Z 2 i® consistent). Moreover, 
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a formula \|/ is a plausible consequence of if \|/ is inferred both from Zj and £ 2 - 
This corresponds to the idea of "All support" protocol defined in [2]. 

Example 2 (continued) We have: 

7tid(qr)=7tid( qr)= 7tj(j(q'r)=l; 7tjd('q'r)=.5, 

and: = { (qvr, .3), (qvr, .5) } 

= {(qvr, .5)}. 

Clearly, when the information provided by the sources is consistent, then (ID) does 
not recover all the information given by each source. For instance, in this example q 
is not obtained while it is recovered using the conjunctive approaches. 

Weighted average 

Let A .1 and X 2 be two non-negative real numbers such that A,i -H A .2 =1. Then: 

Vco, 7t^a(®)= ^1 ^ ^l(®) + ^2 X 7t2(0>))- (WA) 

Its syntactic counterpart is: 

^wa = U<|ii A.ixaj) : i e 1} u |(\|/j X 2 xbj): j e 1} u 

|((|);Vi|/j A,iXaj-l-A, 2 Xbj) :ie I and je J and (|)iVi|/j A tautologies } 

Note that if A,i=l then: Z^^ = Zj as expected. 

It is worth noticing that when Zj^ u Z 2 is consistent then: 

^id ^ 3) 

Za ^ ((|), b) with b>a 

Zjc ^ ((|), c) with c>b 

Zpj-o ^ (<|i, d) with d>c 

Ziu]f ^ ((|), e) with e >d. 

Za is the combination with the particular case of weighted average A,i = X 2 =1/2. 
Roughly speaking it means that, e.g., each belief obtained from Zjd can also be 
obtained from Za with a strength at least greater or equal. However, when Zj u Z 2 
is inconsistent then the above implications do not hold [6]. 



3.2. Prioritized fusion without commensurability 

The last combination mode that we consider is the prioritized fusion when 
commensurability assumption is not made. All the previous approaches use this 
assumption by sharing the same understanding or meaning of the scale [0,1]. 

The approach that we consider in this section is well known in social choice theory 
under the name dictator . The idea is to refine one ranking by the other. More 
precisely, let 7ti and 7t2 be two possibility distributions. Assume that 7ti has 
priority over 7t2. The result of combination, denoted by Ttpf, is obtained from the 
following postulates: 

i. ) if 7ti((o) > Tti(co’) then Ttpf(co) > Ttpf(co’) 

ii. ) if Tti(co) = 7ti(co’) then Ttpf(co) >= Ttpf(co’) iff 7t2(co) >= 7t2(co’) 
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Clearly the combination result is simply the refinement of 7ti (the dictator) by 7t2- It 
consists of taking all conclusions given by 7ti and adding as many conclusions as 
possible (wrt consistency criterion) from 7t2- 

Let Z]^= {((|)p flj) : i=l, ,n } and Z2=((¥j> ^j)' j=I> such that a j >aj+j for 
i<n, and bj>bj^_j for j<m. Then the syntactic counterpart is: 

^pd= xii), ,( xij), 

((^nVVl, Xj !),..,( x„^), 

((^ 1 , yj), ( (^n, y„), 

(l|/l, Zj), ( \|/nj, Zjn)}, 

where the Xy, y^ , z\ define any numerical assignment obeying the following 
constraints: 

Xij > xik (resp. yj > yjj and zj > zj^) for j<k, and for any i 
Xij > X]j 1 for i<k, and for any j,l and 
Xij > yk > for any i,j,k,l. 

When 7ti is a binary possibility distribution encoding some classical formula (|), 
then fusing 7ti with 7t2 leads to the revision process of 7t2 by proposed by Papini 
[16], hinted by Spohn [18], and addressed syntactically by Benferhat, Dubois, Papini 
[5], See also [13]. 



3.3. Merging in kappa-functions framework 

Similar results can be obtained in the kappa functions setting with the help of Table 
1. Let * be a two place function whose domain is K x it which aggregates two 
kappa functions into a new one K*. Again, the only requirements for * are 
monotonicity and 0 * 0=0. Let Kj={((|)j, kj):iel] and K 2 =((\|/j. kj):je J], and Kj and 
K 2 their associated K-functions computed by (##). Then we have the following result 
similar to the one of Proposition 2: 

Proposition 3: Let k* = kj *K 2 - Then, k* is associated with: 

K* = [((|)i, Kj*0) : iel] u 0*Kj) :je J] u |((|)jVi|/j, Kj*Kj): iGl andjeJ}. 

When * is a counterpart of © in the transformation which associates a kappa 
function K with a possibility distribution tIk (see Table 1), e.g., *=addition and 
©=product, then the counterpart of K* is Z© (more details can be found in [6]). 
However, there exists operations © closed in [0,1] (e.g., weighted average) without a 
counterpart closed in it . 

3.4. Related works 

Baral et al. [1] view the fusing of knowledge bases [Kj, ,K n) us the problem of 
restoring the coherence of Kiu uKn. Basically, they define plausible conclusions 
of fused bases as those which are entailed from all maximally consistent subbases of 
Kiu uK„. Benferhat et al. [4] proposed a symbolic fusion operator which is 
closely related to the one based on the product operator. They show that this operator 
recovers several coherent-based approaches, namely those based on selecting 




A Practical Approach to Fusing Prioritized Knowledge Bases 235 



consistent subbases. Hence, using their results, the techniques proposed here for 
fusion can also be applied for restoring coherence of {Kp ,K n}. For instance, if 
one wants to recover the inference based on all maximally consistent subbases with 
a maximal cardinality, here called Card-consequence relation, then we let Li={((|)i a)} 
such that (|)i is the conjunction of all formulas in Kj, and 0<a<l. Let Ttpro be the 
result of fusing the Tti’s with the product operator. Then we can check that: i|/ is a 
Card-consequence iff Vco, s.t. 7tp]-o(d))=l, (0 e [\|/]. 

Lin [14], Revesz [17] propose a fusing operator based on Dalal[8]’s distance. Given a 
set of non-prioritized knowledge bases [Kj, ,K n] and a weighing c(Ki) which 
associates a non-negative number to each Kj. Then they construct a total order <p on 
as follows: 

• define, dist((0, (O’): the number of symbols whose valuation differs in the two 
interpretations, 

• define dist(co, Ki) = min (dist(co, (O’) : (0 e [Kj]}, 

• Lastly, (0 <L to’ iff „ dist(co, Kj) ■ c(Ki) < Ei=i,n dist(co’, K^) ■ c(Ki). 

Clearly, this approach differs from ours: indeed we do not induce the same ranking 
between different interpretations from each knowledge base, since our approach is 
not based on Dalai’s distance but on an uncertainty measure between beliefs in 1^. 
Obviously, if 7tKi(<o) is defined proportionally w.r.t. dist(co, Kj) then <p can be 
recovered using weighted average combination modes. Moreover, if each Kj is a set 
of literals, then define for each lye Kj the possibilistic base Eij=[(lij aj)}, with 
aj=c(Ki) /t] . IKjl, where ri=rj„^i j,c(Ki^). Then, denoting Ttp^o the result of fusing 
of tty’s, we can show that: 

(0 <L (O’ iff7tpro(<0) > 7tpro(<0’). 

Lastly, Konieczny and Pino-P rez [13] have proposed some properties for 
characterizing fusion operators. In particular, the two following ones have been used 
to capture respectively the principles of majority and that of arbitration. 

Let E=[Ki, ,K ji], and A be a fusing operator. Then A is a majority operator if: 

VK, 3n, s. t. A(EuK") ^ K, 

and A is an arbitration if: 

VK, Vn, s. t. A(EuKn)=A(EuK), 

where K" means that the information is provided n times by sources equivalent to K. 
We can check that the product-based combination mode is a majority operation, and 
the min or max-based combination modes are types of arbitration. 

4. Conclusion 

This paper offers a simple generic tool for merging prioritized knowledge bases in a 
meaningful way. We defined notions of common knowledge and discounted 
knowledge which were subsequently used to describe the process of fusion. In 
particular, merging operators for consistent and dependent, highly conflicting, or 
independent sources were developed. In addition, reinforcement effects and the lack of 
commensurability were addressed. These operations can be easily implemented in 
practice, although the computational complexity of fusion operations in the general 
case remains to be studied. 
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Abstract. In this paper a fuzzy behavior based decision control is pre- 
sented. A behavior is described by one or more fuzzy controllers reacting 
to external stimulus, and a state variable (denoted behavior activity) 
measuring how evident are the conditions of application of the know- 
ledge coded in the fuzzy controllers. The autonomous vehicle decision 
control is composed by several behaviors of this kind, each one feeding 
an arbitrator with the activity and action values. The arbitrator evalua- 
tes the final action to send to the actuators taking into consideration each 
behavior activity and action values. Two arbitration methods are propo- 
sed. Results of simulations are presented and conclusions are drawn. 

Keywords: autonomous vehicles, Khepera, fuzzy controller, fuzzy be- 
havior, fuzzy behavior arbitration. 



1 Introduction 

The domain of application of fuzzy control is now large. It can be applied in si- 
tuations where smoothness of actuation is important (e.g. Anti-Block Breaking 
Systems in automobiles), where it is difficult to express knowledge in an analy- 
tical way (high system complexity), where the variables domain are qualitative 
in nature (e.g. water purification), etc. In all these situations the gains are: ro- 
bustness (in the sense that the controller is applicable to situations not explicitly 
declared), project efficiency (the rule base determination is less complicated than 
the determination of some analytical model, e.g., high order differential equa- 
tions), wide application domain (the generality of application of fuzzy control 
is now accepted) and “handiness” of thinking (the rule set partly implements a 
divide-and-conquer problem resolution approach), among others. 

All these issues are related to the control of autonomous vehicles at several 
levels of design. The simpler one is piloting in environments with obstacles. In 
fact, the rate of wheels curvature depends in a qualitatively way on the rate 
and the angle of approach of the vehicle to some obstacle. As another example. 
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the linear and differential velocity of some transportation vehicle depends on 
the weight and shape of the load and how the objects are accommodated in the 
vehicle. For some external observer and for both illustrative situations, it seems 
that this is the humans modus faciendi. Informally, this last statement shows 
the adequacy of fuzzy logic to guiding/driving control. 

The control of an autonomous vehicle cannot be reduced to the guiding task 
([31). There are other modules that must be addressed, which can be quite com- 
plex m, H)- Let’s assume that the objectives of some vehicle are: to have a 
wander behavior, and to have a self-looking-for-energy behavior. Consider also 
that the energy source is somewhere near the walls. It can be associated a wall 
following behavior to the self-looking-for-energy behavior. The projected glo- 
bal behavior is as follows: the vehicle wanders when the energy level is “high” 
enough, and starts looking for energy supply when needed, i.e. look for and fol- 
low walls until it finds energy. The question is: is it possible to have a single fuzzy 
controller taking care of everything? Set aside the operators (such as aggrega- 
tion and composition operators, defuzzification, etc.), the main component of a 
fuzzy controller is the knowledge base, which is composed of rules. In a typical 
situation, all rules depend on the same input variables and influence the same 
output variables; they are one of a kind. The input variables domain imposes dif- 
ferent contexts, and as so, some rules will have more importance on the outputs 
than others. Even if one uses both positive (conditions to satisfy) and negative 
knowledge (conditions that must not be satisfied), in practice such a centralized 
controller will have such a dimension that it will be difficult to design, either in 
terms of knowledge elicitation or performance tuning. 

In fact, in the traditional approach to fuzzy control (lin],!ii!), the controller 
is made up of a knowledge base which is usually a monolithic computational 
entity; it works like a mathematical function where one feeds the inputs and 
receives the outputs. The controller design is finished as soon as there is some 
kind of satisfaction about its performance. As so, it can be seen as an indivisible 
entity, because there is no use for the discrete parts (the rules). This does not 
imply that the knowledge base (or other parameters, e.g., membership functions) 
could not be modified. This is indeed the case when there is the need to adjust 
on-line the performance of the controller (i .|16]1. It happens in situations where 
some parameters of the system change, and the controller has to be modified to 
track the new behavior of the system. 

The size of the controller depends heavily on the quantity (number of input 
variables) and granularity (number of membership functions per variable) of the 
inputs. Although the process of rule determination is modular, meaning that 
when “thinking” in some rule one should not be too much concerned about the 
others (one thinks in each rule separately), it is known that the expressive power 
of fuzzy logic decreases with the increase in the knowledge base size. 

Building a big centralized monolithic machine that has the capacity of doing 
many things might not be the best way to address a difficult problem. On the 
other hand, the decentralized (modular) way of thinking, or the bottom-up ap- 
proach m) advocated by artificial life, could facilitate the construction of such 
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a controller, leading to a behavior based architecture. So, “keep the knowledge 
base modular” and “find a fuzzy controller for each behavior” (or module of the 
application) can be seen as good heuristics for solving large problems through 
fuzzy logic. 

A new question then arises: how can simultaneous knowledge bases be arti- 
culated, each one showing some kind of competence different from each other? 
This is the issue of the arbitration problem, which is central in behavior based 
autonomous vehicles and also central in the study reported in this paper. In 
concrete, the experience about controlling a Khepera mobile robot is presented 
(see Fig.[T|). 

The organization of the document is as follows. In Sect. 2 some approaches to 
the application of fuzzy logic in autonomous vehicles are referenced, pointing out 
the connection to the work reported here, when applicable. In Sect. 3 the fuzzy 
behavior thematic is introduced; mainly, the constitution of a fuzzy behavior is 
presented. In Sect. 4 the integration of several fuzzy behaviors or the arbitration 
problem is considered. In Sect. 5 the results of the conduced experiments are 
presented. Finally, in Sect. 6 some conclusions are drawn and directions for 
future research are pointed out. 




Fig. 1. Mobile robot Khepera with its eight infrared distance sensors (labeled from 0 
to 7) and two motors 



2 Relations to Other Works 

Good references to start on fuzzy logic and to revisit still are m and m- 
An utilization of fuzzy control to the steering of a mobile vehicle can be found 
in . Other utilizations of fuzzy control in the classical sense, with a comparison 
with neural control, for navigation of a mobile platform can be found in and 

m- 

Implementations of adaptive fuzzy control applied to motor control and to 
navigation can be found in and |16j . respectively. 

An implementation of distributed fuzzy control with applications to mobile 
robotics can be found in [S|, with more focus on physical distributedness of com- 
putation rather than competence distributedness, as is favored here. Similarities 
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with this work can be found in and [3- Namely, the work reported in 
embraces a larger domain of application than the reported here. A combination 
between two separated fuzzy controllers (one for each actuator) and non-fuzzy 
controllers in a navigation system can be found in [^. The major resemblance 
is with the work of Saffiotti, Ruspini and Konolige, which can be appreciated 
in several references, such as E. m and m- This work extends into other 
problems beyond navigation, as is task planning. 

3 Fuzzy Behaviors 

A behavior is considered here as the reaction of the vehicle in the presence of some 
condition (environmental, in the typical situation) . Why should fuzzy behaviors 
be used? First of all the term fuzzy relates with the fuzzy logic scientific topic, 
i.e. how the behaviors are implemented, and not to the behaviors itself. Fuzzy 
logic is used for several reasons; among them are: 

— a way to get a smooth reaction to the environment (the vehicle turns as it 

gets closer to some obstacle); 

— it is relatively easy to express the knowledge to do just that; 

— an adequate representation for declaring knowledge which is: 

— uncertain (the sensors do not deliver correct data due to obstacle color 
and texture, functioning temperature dependency, etc.), 

— incomplete (the information captured by sensors is local and not global, 
due to environment scanning inside a tight 3-D cone), and 

— approximate (if there is imprecision, then the data is of an approximate 
nature only). 

In spite of these attributes, there are not so many works in decision control 
in autonomous vehicles using fuzzy logic as are without it. This may happen 
because the main problems are the autonomy of the vehicle decision control and 
its level of competence, and not only the trajectory smoothness, for instance. 
However, the bigger the vehicle, the more concern should be put in the smoo- 
thness of the trajectories, if one wishes to have a safe and good use of batteries 
and not to break any electrical and mechanical part. In small vehicles there is no 
inconvenience in using threshold values to switch from small velocities to higher 
ones, to make tight bends or to stop suddenly; this is the case of the Khepera 
mobile robot. 

One should not mix the anticipation of a bend (smoothness in turning) with 
fuzzy logic. This anticipation will be achieved if the sensor types and disposition 
allows so. In the case of Khepera, the sensors are not very good concerning 
correctness, range and linearity, but they are good enough for the goal at hand. 

Fuzzy logic allows to have a smoothness in actuation, in the sense that no 
matter the amount of stimulus, there is always a corresponding amount of ac- 
tuation. To this corresponds, most of the time, some degree of trajectory smoo- 
thness. As a consequence, the sensibility of the reaction is higher with fuzzy 
logic. 
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3.1 Constitution of a Fuzzy Behavior 

For the time being the simplest form of behavior representation is considered, 
whose block diagram is illustrated on the left side of Fig. [21 It is similar to the 
one proposed in The fuzzy controller (FC) has the ability to create action, 
according to the designers intention. The knowledge declared in FC should code 
some particular competence necessary to the performance of the vehicle; exam- 
ples used in this work are: how to drive in the direction of light and how to avoid 
left, right and front obstacles. FC receives input from the environment through 
the sensors and creates action through the actuators. It is also possible to use 
the notion of fuzzy state of the vehicle either as input or output variables. 



Input — h-J state variable Activity 

variables [ J 



Input 
variables ' 



FC 



• Action 




0 200 400 600 800 1000 

light sensor return value 



Fig. 2. Left: Behavior block diagram. Right: Shapes used in the experiments for the 
state variable membership functions 



It is a special feature of the proposed approach that each behavior does not 
depend on others, at least during the design process. In this sense, each behavior 
knowledge base is determined only in terms of the type of (qualitative) response 
the behavior shall produce for each (qualitative) environmental condition. When 
constructing the rule base it should be clear for the designer what the vehicle 
must do when the behavior being constructed is active. 

The state variable block in the left side of Fig. [2] is composed by a fuzzy pre- 
dicate and the corresponding membership function whose intention is to measure 
how evident are the conditions of application of the behavior action in the pre- 
sent state of the environment, by returning a value belonging to the interval 
[0,1]. In other words, if it is considered that the behavior fuzzy controller co- 
des some competence or know-how, then the state variable measures the degree 
of membership of the environment state to the conditions of application of the 
fuzzy controller. The environment conditions to satisfy are the ones that mat- 
ter from the point of view of the behavior competence. The block input could 
refer to sensor values (reflecting the state of the environment) or to internal 
conditions (reflecting the state of the vehicle) . By measuring how important are 
some environmental or internal conditions, the activity value is informing the 
final actuator evaluation module (the arbitrator) for how much the action values 
supplied by the behavior should be taken into account. It is up to the arbitrator 
to decide taking in consideration the action and activity value delivered by each 
behavior. Note that the state variable and FC do not influence each other. 

It is also desired that the global conduct of the vehicle be dependent on some 
condition. Assume that this condition is internal to the vehicle, like in the fol- 
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lowing situation: as soon as the energy level drops below the energy threshold 
value, the vehicle should look for energy. This is accomplished by knowing that 
the re-charging station is somewhere near a wall; so the vehicle starts to follow 
walls as soon as it finds one. It can be said that the vehicle’s global behavior 
shifted from a wander behavior to a wall follow behavior. What does this situa- 
tion have to do with the fuzzy behavior proposed so far? The issue is that the 
vehicle should consider left, right and front obstacles in a different way as it did 
before (when it was wandering). So the fuzzy controllers in avoid-left-obstacles, 
avoid-right-obstacles and avoid- front obstacles must be different. This is achie- 
ved by letting more than one fuzzy controller reside inside the action module. 
Finally, there is the need to switch between the several PC’s available. This kind 
of rotary switch depends on some variable; in this case an internal state variable 
as it is energy. The new block diagram of a fuzzy behavior with more than one 
way of implementing its competence can be found in Fig. [31 It should be noted 
that the capacity to choose the active fuzzy controller resides in the arbitration 
module (the details will be presented in Sect. 2). 




Fig. 3. Block diagram of the extended behavior 



3.2 A Fuzzy Behavior Example 

An example of an avoid-front-obstacles behavior is considered. The fuzzy control- 
ler is characterized by having two input variables (front-distance and difference- 
in- front-readings) and two output variables (left-motor and right-motor). Each 
variable is defined by having three fuzzy terms, which are: far, medium and close 
for input variable front-distance; negative, zero and positive for input variable 
difference-in-sensor-readings; and forward, stop and backward for both left-motor 
and right-motor output variables. 

The membership functions of each fuzzy term associated to each input varia- 
ble are presented in parts a) and b) of Fig. 2] (observe that sensor values range 
from 0 to 1023). The same happens in part c) in relation to the output varia- 
bles (observe that motors should receive any integer value from -10 to 10). In 
what respects the shape of the membership functions, they where adjusted by 
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hand in a way to compensate the non-linearity of the infrared distance sensors 
in Khepera. 



Part A) Distance Input Variable Part B) Diference Input Variable Part C) Output Variable 

1 

0 . 8 
0.6 
0.4 
0.2 
0 

10 100 1000 10000 -600 -400 -200 0 200 400 600 -10 -5 0 5 10 

sensor return value diference in sensor returned values actuactor applied value 





Fig. 4. Membership function definitions: part a) far, medium and close for input varia- 
ble front-distance (note the log scale on the r-axis), part b) positive, zero and negative 
for input variable difference-in-sensor-readings and, finally, part c) backward, stop and 
forward for output variables left-motor and right-motor 



The crisp input values are evaluated from the values of sensors ‘2’ and ‘3’ 
(recall Fig.[T]) using expressions [T] and [21 In the case of Khepera, the two front 
sensors are parallel, and lower sensor values represent longer distances. So when 
there is the need to know the front distance it is safer to take the maximum of 
both front sensors. Also, the right side of expression |2] is a simple way to have a 
measure of the approach angle to an obstacle. 

Vdist = 'rnax{sensor[2], sensor[3]) . (1) 

■Cdif = sensor [2] — sensor [3] . (2) 

The activity value is a function of sensors ‘2’ and ‘3’ because these are the only 
ones pointing to the front. The activity is evaluated by applying the returned 
value from equation |T]to a sigmoide like expression, which returns a value in the 
[0,1] interval. This function is labeled sigmoide in the right side of Fig. [2] Thus 
the satisfaction of every predicate is measured in each iteration step. 

Finally, the knowledge base is represented in tabular form in Fig. O As an 
example of how to read the table, consider the rule in the top-left rectangle: “if 
distance is far and difference-in-sensor-readings is negative, then left-motor is 
forward and right-motor is forward'" . 

4 Fuzzy Behavior Arbitration 

Behavior arbitration is a central topic in autonomous vehicles, but the corre- 
sponding in fuzzy logic, which can be called as controller arbitration, is not so 
common. The question behind arbitration is how to decide when there are several 
actions that can be performed. 

In the non-fuzzy approach to behavior arbitration (the crisp way), the swit- 
ching process depends on some threshold values, which means that the vehicle 
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Fig. 5. Knowledge base for avoid-front-obstacles fuzzy controller, where F, S and B 
represent forward, stop and backward, respectively; the subscript I denotes left and r 
denotes right 



tends to react the same way in some subset of input conditions. In a sense, the 
threshold values voids the richness of the environment, meaning that the increa- 
sing evidence of some environmental feature is neglected until it is considered to 
be strong enough. This is where fuzzy logic is considered. 

What is fuzzy behavior arbitration? It is considered as a way to decide what 
the actuators shall receive taking into account the action and the relative impor- 
tance of each behavior (the activity value presented earlier), and because these 
values carry no information about the outcome of each action (namely some 
contradictory actions could be sent to the arbitrator), some internal state of the 
vehicle is also considered in the decision process. This is a (too much) global 
definition that will be clarified in a short notice. In the time frame of this work, 
two methods of behavior arbitration were evaluated: “weight” and “max” , which 
are in essence defuzzification operators. These are addressed shortly. The effect 
of the vehicle’s internal state will be considered in Sub-sects. IQ] and S3 and 
also in Sect. S] 

4.1 Two Arbitration Methods 

A kind of defuzzification process is used in two different ways, whose expressions 
can be formulated as follows: let N denote the number of behaviors, M? denote 
the j-th motor (j=left, right). Si be the value of the i-th behavior state variable 
and Aj be the i-ih behavior action for M?, then: 

— Arbitration by Weight 

^ ^ = left, right . (3) 

— Arbitration by Max 

: Si = max Si, j = left, right . (4) 

i=l...N 

The intuitive arguments behind each mode are the ones used for the defuz- 
zification methods in fuzzy logic m)- In the first case, it is considered that 
every behavior is important, and so they must be considered in the actuators 
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final value. In the last case, it is given total importance to the behavior showing 
maximum activity, as some research work based on ethology seems to consider 
f|20|. [13] and [15] are some examples). The results of both evaluation methods 
are presented in Sect. [5] 



4.2 Behavior Inhibition 

As soon the behavior activity value is different from zero, it can be said that 
the behavior is active. Depending on the arbitration method, this behavior ac- 
tivity could influence how the arbitrator commands the actuators. Apart from 
these two facts, nothing has been said about the number of behaviors that are 
permitted to be active independently of their activity values. In other words, 
a situation in which there is the need to disregard some behavior could exist. 
Take the following example. The vehicle should look for a re-charging station 
(simulated by a light) when the level of energy drops below some predetermined 
value. The presence of light should be taken in consideration in case the vehicle 
needs to re-charge, and should be disregarded otherwise. This kind of conduct is 
achieved by letting the arbitration module to impose a zero on the look-for-light 
behavior activity value (in the case it does not need to re-charge), or to consider 
the activity value in the actuator evaluation (otherwise). 



4.3 Behavior Incompatibility 

When more than one behavior with the same intention or purpose exists, there 
is the need to decide which behavior is allowed to influence the actuators, be- 
cause there must be exclusivity in the decision. The same argument holds for 
fuzzy controller switching inside a behavior, because this implements the same 
competence in different ways; and as so they are incompatible or could be contra- 
dictory. In the present approach and as mentioned before, there must be a state 
variable upon which to ground the decision to choose another fuzzy controller. 
In the illustrative example that has been used in the explanations, this state 
variable is energy; namely, when the available energy is below some threshold 
value the piloting mode is to follow walls, as opposed to wander which is the 
mode when the available energy is high enough. 



5 Experiments and Results 

Lets make a short recall of the experiment. Khepera wanders until it falls short 
of energy. As soon this happens, it looks for walls, because it is known that 
these are the re-charging station probable locations. When it finds one, it stops 
for a while (simulating re-charging time) and resumes movement in a different 
direction than before (selected randomly). 

There are four fuzzy behaviors: 1) avoid-left-obstacles, 2) avoid- front-obsta- 
cles, 3) avoid-right-obstacles and 4) look-for-energy. Behavior 1 input is connec- 
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ted to sensors 0, 1 and ^of Khepera (recall Fig.IJl, behavior 2 input is connected 
to sensors 2 and 3 (according to expressions[T]and|^, behavior 3 input is connec- 
ted to sensors 3, 4 and 5, and behavior 4 input is connected to sensors 1, 2, 3 
and 4. 

Behaviors 1, 2 and 3 have two fuzzy controllers in the FC block: one allowing 
wandering behavior and the other allowing wall following behavior. In each case, 
the state variable membership function used is the one labeled “sigmoide” in 
Fig. El In the case of look-for-energy behavior, the state variable membership 
function is the one labeled “light-sigmoide” . 

Each fuzzy variable is defined by three fuzzy terms, with the output fuzzy 
variables of wall following mode being the exception: there are four fuzzy terms 
in left-motor and right-motor variables. Since it is not easy to manually tune 
fuzzy models, the use of four fuzzy terms per variable was a simple way to make 
Khepera contour the obstacles. Concerning the fuzzy controller definition, the 
following operators were used: aggregation by min, composition by max and 
defuzzification by centroid. 

Energy drops linearly with time from the maximum value (MaxEnergy) to 
‘O’. Note that ^(MaxEnergy) = 1 and /j,( 0) = 0 for the energy state variable 
membership function. The threshold value to switch between wandering and 
following piloting modes is ^{energy) = 0.2; which is an experimental value 
giving Khepera enough time to find the light source before the energy fades way. 

Two kinds of experiments are presented: one where the behavior arbitration 
is made by “weight” and the other where the “max” method is used. The results 
of the simulation are presented in Figs.[6|and[3 The environment is composed by 
walls and by one light source representing the re-charging station (denoted by the 
symbol ‘L’). The legend is as follows: the symbol ‘-I-’ denotes walls, light lines 
represent trajectories where the vehicle is wandering and dark lines represent 
trajectories where the vehicle is following walls. FigurediUustrates the trajectory 
broken in a eight figure set for sake of clarity; it should be observed from left to 
right and top to bottom. There is an arrow showing the starting point and the 
heading of Khepera. The starting behavior is wander. Note that the lines denote 
the trajectory of the center of the robot, which means that it approaches more 
the walls than the impression given by the figures. 

As a first observation, note that in the left side of Fig. El Khepera misses the 
presence of the light source because the evidence of light is always less than the 
evidence of a wall by its right side (this fact is asserted in the right side of Fig.[6l 
where the activity values for both behaviors are shown) . In a real dependency of 
energy, the khepera would “die” . It must be said that this is not always the case. 
If the distance to the wall increases by a small amount then it could be enough 
for the evidence of light to increase and the evidence of the wall to decrease, 
having as result Khepera finding the light source. Note that this could happen 
because the Khepera distance sensors are nonlinear and noisy; as a consequence 

^ It could be argued that sensor ‘2’ is not on the left side of the vehicle, and as so it 
should not make part of behavior 1. This is partly true. The fact is that the presence 
of sensor ‘2’ is most help-full in realizing the contour (left) obstacles mode. 
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Fig. 6. Left: Simulated trajectory resulting from the “max” arbitration. Right: State 
variable values in wall following mode for fuzzy behaviors avoid-right-wall and look- 
for-light, near the vicinity of the light 



the trajectories are not always equal. This is only a local proof and not a general 
demonstration of the inadequacy of the “max” arbitration method. However, 
this fact shows a drawback of this method when in comparison to arbitration by 
“weight” (in Fig. the light position is the same as in Fig. 0 . 

As a second experiment, the “max” method is used and the light source is 
moved closer to the wall, as is illustrated in Fig. [HI Now an informal compa- 
rison can be made between both arbitration methods. This comparison is not 
straightforward, however, since the system shows dependence on initial condi- 
tions. Notwithstanding, it is observable that in the “weight” method walls act 
as more repulsive obstacles to the vehicle than what results from the “max” 
method, i.e., the vehicle spends more time turning. In fact, the amount of time 
that each behavior is influencing the actuators is larger in the “weight” method. 
Additionally, the “weight” method shows a better performance when guiding in 
corridors because it tends to cancel trajectory oscillations. Because this effect is 
not very clear in Figs. 0 and |H1 the result of an experiment where the vehicle 
travels in a small corridor is provided in Fig. 0 It should be stressed that the 
knowledge bases used in each behavior are not changed from one arbitration 
method to another (i.e., from Fig.[7|to Fig. |S]), which is an important feature of 
the proposed method. 

6 Conclusion 

This papers reports a work on autonomous vehicles, where piloting is accomplis- 
hed with fuzzy logic. Our main motivation was to conclude over the adequacy 
of fuzzy logic to support a behavior based approach to autonomous vehicle pilo- 
ting. Each behavior is composed by an action module, implemented by at least 
one fuzzy controller, and an activity value which is (for now) implemented by a 
membership function, supported on the environment state. 

Each behavior is connected to the arbitrator and there is no hierarchy among 
behaviors. The arbitrator role is to create action from the action and activity va- 
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Fig. 7 . Total simulated trajectory resulting from the “weight” arbitration (the trajec- 
tory is separated in pieces for sake of clarity; time flows from left to right and top to 
bottom) 




Fuzzy Behaviors and Behavior Arbitration 



249 




Fig. 8. Total simulated trajectory resulting from the “max” arbitration (the trajectory 
is separated in pieces for sake of clarity; time flows from left to right and top to bottom) 
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Fig. 9. Simulated piloting in a corridor with arbitration by “weight” and by “max”, 
where the arrow indicates the starting point and the initial direction of the movement 



lues supplied by each behavior. Two ways for realizing arbitration were proposed, 
which in essence are defuzzification methods: “weight” and “max” . 

Results of simulation experiments were presented, where the adequacy of the 
proposed approach to the piloting task was shown. 

Not much attention was purposely given to find how optimal are the kno- 
wledge bases of the presented fuzzy controllers. This constitutes an important 
point for performance tuning, and as so it should be addressed, for instance with 
a genetic algorithm (ID)- 

The arbitration problem is also a challenge, and as so it deserves future 
efforts. 

Other open issues worthing future research are: i) creation of other useful 
behaviors as are piloting in tight corridors, piloting in agglomerated or overcro- 
wded areas; ii) creation of monitoring behaviors with the capability of avoiding 
stagnation or trajectory pattern repetition, as happens when the vehicle con- 
tours an isolated obstacle; iii) architecture extensibility to many behaviors; iv) 
map building with the intent to know the re-charging station location; and v) 
development of strategies for performance measuring of long range autonomy. 
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Abstract. This work is primarily based on the use of software agents for 
automated negotiation. We present in this paper a test-hed for agents in an 
electronic marketplace, through which we simulated different scenarios 
allowing us to evaluate different agents’ negotiation behaviours. The system 
follows a multi-party and multi-issue negotiation approach. We tested the 
system by comparing the performance of agents that use multiple tactics with 
ones that include learning capabilities based on a specific kind of 
Reinforcement Learning technique. First experiments showed that the adaptive 
agents tend to win deals over their competitors as their experience increases. 



1 Introduction 

Internet and www popularity has strongly increased the importance and popularity of 
electronic commerce, which goes far beyond both the searching for services exposed 
in virtual stores and price comparison. Electronic commerce includes a negotiation 
process between buyers and sellers, in order to find an agreement over the price and 
other transaction terms. It also deals with service demands and electronic payment. 
Common systems that consider mainly the electronic ads, based on the creation of 
classified ads sites with search capabilities, can be enhanced by semi-intelligent 
mechanisms relying on agent technology. Such enhanced systems can be applied to 
the different stages of the Consumer Buying Behaviour model, as explained in [3] [4]. 
In particular, when applying to the negotiation process, agents enable new types of 
transactions, where prices and other transaction issues need no longer to be fixed. 

A typical case of negotiation process is the auction. Negotiation on the Internet 
often amounts to one party (typically the seller) presenting a take-it-or-leave-it 
proposal (e.g., a sale price). Auctions represent a more general approach to look for 
an appropriate price, admitting a range of negotiation protocols [7]. These auction- 
based negotiation protocols include the English auction, the Dutch auction, the First- 
price Sealed Bid and the Vickrey auction. These auction-based protocols are called 
single-sided mechanisms, because bidders are all buyers or all sellers, and they also 
include an auctioneer. Double-sided auctions admit multiple buyers and sellers at 
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once, like the continuous double auction [7], There are several auction-based systems 
that are in use for electronic commerce. 

AuctionBot is an auction server where software agents can be created to participate 
in several types of auctions. Kasbah is a web-based system where users create 
autonomous agents that buy and sell goods on their behalf [1]. It is based on 
continuous double auction mechanisms. In the Kasbah marketplace, buying and 
selling agents interact and compete simultaneously, making price bids to their 
counterparts. The agents have a function that changes the price they are willing to 
propose over time. A different negotiation approach involves negotiating over 
multiple terms of a transaction, unlike the typical way of auction negotiation, that 
only considers the price of the good. Tete-a-Tete is a system where agents 
cooperatively negotiate in this way. It intends to provide merchant differentiation 
through value-added services, like warranties, delivery times, etc. [4]. 

Following this more general approach, in [2] negotiation is defined as a process by 
which a joint decision is made by two or more parties. The parties first verbalise 
contradictory demands and then move towards agreement by a process of concession 
making or search for new alternatives. In that paper, a many parties, many issues 
negotiation model is adopted, that is, multilateral negotiations (like a continuous 
double auction) about a set of issues (the transaction terms). Several negotiation 
tactics are tested, and a negotiation strategy model is explained. 

The work described in the present paper aims to combine those tactics in a 
dynamic way, in order to endow adaptive market agents with strategies (appropriate 
ways of selecting combinations of tactics) and to compare their performance with 
agents that are not adaptive. Since the marketplace is a dynamic environment, 
adaptive agents are expected to benefit from changing conditions, therefore taking 
advantage over others. A test-bed has been created to provide the interaction process 
between market agents with different capabilities. 

Section 2 describes our multi-agent platform for electronic commerce (the test- 
bed), including the negotiation assumptions, model and protocols adopted (which 
have been based on those introduced in [2]). Section 3 gives details on negotiation 
tactics (also adopted from [2]) which can be combined to either user-defined 
strategies or strategies based on Reinforcement Learning techniques. Section 4 
focuses on several different scenarios and describes those situations that were chosen 
for testing purposes. Conclusions drawn from the testing scenarios are presented in 
section 5. We conclude the paper, in section 6, by presenting some topics of our 
future work. 



2 A System for Electronic Commerce 

In this section we describe the basic negotiation approach and the architecture of the 
SMACE (Sistema Multi-Agente para Comercio Electronico) system. It is a multi- 
agent system for electronic commerce, where users can create buyer and seller agents 
that negotiate autonomously, in order to make deals on services they are requesting or 
offering. SMACE has been used as a test-bed for different negotiation paradigms, 
both user-controlled and self-adaptive. 
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2.1 Negotiation Model 

The negotiation model we have adopted is multilateral and based on many issues (that 
is, multidimensional), as described in [2]. Multilateral refers to the ability that each 
buying or selling agent has to negotiate simultaneously with many other selling or 
buying agents. In auction terms, it relates to a sealed-bid continuous double auction, 
where both buyers and sellers submit bids (proposals) simultaneously and trading 
does not stop as each auction is concluded (as each deal is made). In practical terms, 
this multilateral negotiation model works as many bilateral negotiations that can 
influence each other. The bilateral negotiation model we are using is based on the one 
defined in [5]. In our work, negotiation is realised by exchanging proposals between 
agents. The negotiation can be made over a set of issues instead of the single issue 
price found in most auctions. A proposal consists of a value for each of these issues 
and is autonomously generated and proposed by one agent. 

This negotiation model is also called a service-oriented negotiation model [2], 
since it involves two roles that are, in principle, in conflict: sellers of services and 
buyers of services. A service is something that can be provided by an agent and 
requested by some other agent, and over which they can negotiate. As a general rule, 
we can say that opponents in the negotiation process have opposing interests over all 
issues under negotiation. However it could happen that both agents have similar 
interests on a specific issue, thus proposal evolution may proceed in the same 
direction. 

The sequence of proposals and counter-proposals in a two-party negotiation is 
referred to as a negotiation thread. These proposals and counter-proposals are 
generated by linear combinations of functions, called tactics. Tactics use a certain 
criteria (time, resources, etc.) in generating a proposal for a given negotiation issue. 
Different weights can be assigned to each tactic used, representing the importance of 
each criterion for the decision making. Agents may wish to change their ratings of 
criteria importance over time. To do so, they use a strategy, defined as the way in 
which an agent changes the relative weights of the different tactics over time. 

For each issue j s {1, ..., nj under negotiation, each agent has a range of 
acceptable values [miri., max)], and a scoring function Vy [min), max)] [0, 1], that 
gives the score an agent i assigns to a value of issue j in the range of its acceptable 
values. The higher the score, the better the agent’s utility. Agents also assign a weight 
w) to each negotiation issue that represents its relative importance. Assuming 
normalised weights (J) w) = 1), the agent’s scoring function for a given proposal 
X = (x,, ..., xj combines the scores of the different issues in the multidimensional 
space defined by the issues’ value ranges: V(x) = Ej w) V)(x^.). The overall proposal 
evaluates to a score of zero if any of the issues’ values is outside its range. 



2.2 Negotiation Protocol 

We assume for the negotiation protocol that message delivery is reliable and message 
delays need not to be considered (because they are presumably short). 

At a particular point in time each agent has an objective that specifies its intention 
to buy or sell a specific service. That objective has to be achieved in a certain amount 
of time, specified by a deadline. Negotiation stops when this deadline is reached. 




A Multi-agent System for Electronic Commerce 255 



A bilateral negotiation starts after the two parties - the buyer and the seller - meet 
in the marketplace and match their objectives (i.e., they agree that what the buyer 
wants to buy is what the seller intends to sell). Once this is achieved, a negotiation 
thread becomes active between the two agents. It will stay active while the agents 
negotiate with each other. 

The agents will then exchange a sequence of proposals and counter-proposals. 
When receiving a proposal, an agent will generate a counter-proposal. Both - the 
received proposal and the generated counter-proposal - will be evaluated using the 
scoring function described above, and the agent will answer with one of three possible 
ways: 

• withdraw from negotiation if the deadline was reached, or if a deal was made with 

some other agent; 

• accept the proposal received, if it scores higher than the one generated; 

• otherwise, send the generated proposal. 

When an agent b receives an accept message from agent a, it will respond with a 
deal confirmation or rejection. Since there can be virtually any number of agents in 
the marketplace, this solves the problem that could rise if an agent receives two 
simultaneous accept messages from two different agents (we are assuming that the 
agent’s objective only admits trading one unit of a service at a time). Therefore, if 
agent b did not commit with any other agent, it will close the deal with agent a, the 
sender of the accept message. This agent will wait until it gets an answer from agent 
b. A deadlock problem could rise if a group of agents was waiting for deal 
confirmations in a closed loop configuration. We address this problem by making 
agent a, which is expecting a deal confirmation, withdraw any other negotiations and 
reject any deal with any other agent. This means that eventually it will loose potential 
deals with other agents, if it receives a rejection from agent b. Since we are assuming 
short message delays, the problem of lost deals is not likely to occur often, and we 
ignore it. 



2.3 SMACE 

SMACE allows users to create buyer and seller agents that negotiate under the model 
and protocol described above. The system was implemented with the JDKl.1.4 API 
[9], and uses the JATLite [8] package to easily build agents that exchange KQML [10] 
messages. The agents communicate with each other in the Marketplace, which is an 
enhanced JATLite router, facilitating the message routing between the agents and 
working as an information centre for the agents to announce themselves and search 
for contacts. 

The SMACE API consists of three layers built on top of the JATLite packages. 
These layers also consist of packages, and using the SMACE system can take place at 
any of them: 

• Infrastructure - this layer consists of two fundamental parts: 

• MarketAgent: a template for the creation of market agents. It already has 
implemented the model of negotiation and its associated protocol. The only 
task left to the user starting in this layer is providing his own negotiation 
tactics; 
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• Marketplace', the application that represents the marketplace, as a space where 

the agents meet and trade. It includes message routing and agent brokering 
facilities. 

• Plug&Trade - this layer includes predefined market agents that can also be seen as 

examples of how an agent can be built using the MarketAgent template: 

• MultipleTacticAgent (MTA): a market agent that is able to use a weighted 

combination of three tactics, one from each of the tactic families described in 
the next section, to generate its negotiation proposals. 

• AdaptiveBehaviourAgent (ABA): a market agent that is able to weight the 

several tactics that it is using in an adaptive way, using Reinforcement 
Learning techniques. 

• U serinterface - this layer consists of an application that provides both an HTML 

user interface for the creation and monitoring of Plug&Trade market agents 
operation and their persistence. 

While accepting agents from anywhere to enter the marketplace and trade 
(provided that they use the same negotiation protocol), SMACE allows the user to 
launch predefined agents (both of MTA and ABA types) by adjusting its parameters. In 
order to do so, one can use the SMACE user interface. Through this interface the 
agents’ activities can also be monitored and its parameters setting can be changed as 
well. Eurthermore, the user may create his own agent, with his own tactics, in any 
programming language or platform he wishes. The SMACE API Infrastructure 
package assists agent building in Java. This package allows the user not to worry 
about communication and negotiation protocol details, spending his efforts on 
building his own negotiation strategy, that is to say, the agent’s deliberative 
knowledge. 

2.3.1 Agent Matching 

As mentioned before, market agents contact the Marketplace to search for agents that 
have complementary objectives, i.e., objectives including opposite agents’ intentions 
over the same service. 

To facilitate the matching of services, these ones are described using what we call 
descriptive issues, i.e., descriptor/value pairs that all together define the object of 
negotiation. Then, values for the same descriptive issues for different objectives are 
compared, and if the agents agree that they are “talking” about the same service they 
will negotiate over it. The SMACE system can be easily configured to specify the 
descriptive issues that market agents will use to describe their own services. 

2.3.2 Negotiation Issues 

In order to negotiate properly, besides using the same communication and negotiation 
protocols, market agents should agree in what issues the negotiation will be about. 

The SMACE system can be easily configured to consider any number of 
negotiation issues. All the market agents created in the system will then use the same 
set of issues. These issues are all considered as uniform, in the sense that, for the 
market agents, they do not have a semantic attached. Therefore, each market agent 
will define a weight, a range of acceptable values and a scoring function for each one 
of the used issues. Opposite intentions (buying and selling) usually imply somehow 
contrasting scoring functions, e.g., the scoring function for the issue price is 
decreasing for buyers and increasing for sellers as the price increases. 
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3 Tactics and Strategies 

One of the main aims of our work is to compare the performance of agents using 
strategies based on dynamic adaptive behaviours with those based on more 
conventional and less dynamic, or even static, behaviours. 

The goal of negotiation is maximising the utility gained in a transaction, and in 
order to do so the focus is on how to prepare appropriate proposals as well as counter- 
proposals. 

The predefined SMACE market agents use a specific tactic or a combination of 
several tactics to generate proposals. We focused on tactics that we adopted from [2]: 

• Time-dependent tactics: agents vary their proposals as the deadline approaches. 

These tactics use a function depending on time that can be parameterised. 

• Resource-dependent tactics: agents vary their proposals based on the quantity of 

available resources. These tactics are similar to the time-dependent ones, except 
that the domain of the function used is the quantity of a resource other than time. 
This is done either by making the deadline dynamic or by making the function 
depend on an estimation of the amount of the resource. 

• Behaviour-dependent tactics: agents try to imitate the behaviour of their opponents 

in some degree. Different types of imitation can be performed, based on the 
opponent’s negotiation policy over a sequence of his proposals: proportional 
imitation, absolute imitation and averaged proportional imitation. 



3.1 Time-Dependent Tactics 

Time-dependent tactics vary the proposals as the deadline approaches. An agent a has 
to find a deal until . A proposal x for issue j from agent a to agent b at time t, 
0 <t can be calculated as follows: 

( 1 ) 

miuj-l- ^(t)( max‘j rniny j, if eJ is decreasing 
miuj-l-fl ‘j(t))( max‘j minj j, if eJ is increasing 

where V\ is the scoring function whose gradient reflects the agent’s intention (as 
referred in subsection 2.3.2). 

Any d‘j(t) function defining the time-dependent behaviour must satisfy these 
constraints: 0 < d‘.(t) < 1 (offers are inside the range); d'fO) = k“- (a®, adjusts the 
initial value at initial time); d^(f^J = 1 (the reservation value - the smallest result of 
the scoring function lA - will be offered at the deadline). 

In order to satisfy these constraints two classes of functions are presented: 

• Polynomial: 
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The parameter j3 e 3t is used to adjust the convexity degree of the curve, allowing 
the creation of an infinite number of possible tactics. For values of j3 < 1 the 
behaviour of both classes of functions can be described as boulware, i.e., concessions 
are made close to the deadline, otherwise the proposals are only slightly changed. 
With P > 1 the behaviour is called conceder. An agent prepared like this urges to 
make a deal and reaches its reservation value quickly. 



3.2 Resource-Dependent Tactics 

Resource-dependent tactics vary the proposals based on the quantity of a resource 
available. Boulware behaviour, used in the presence of a large amount of resources, 
should change to conceder behaviour when resources run short. 



3.2.1 Dynamic-Deadline Tactics 

This tactic sub-family varies the agent’s deadline according to the availability of a 
particular resource. The resource modelled here is the number of agents that are 
negotiating and the average length of the active negotiation threads. If a selling agent 
a notices many interested parties for its good then there is no need to urge for an 
agreement. The set of agents negotiating with agent a at time t is 

(t) = |i I x^- ^ is activej 



A dynamic deadline using the resource described above is 

I I 2 

_ » l« <')l 



*max 



I a 



(5) 



where fl is the time agent a assumes to be needed to negotiate with an opponent and 
\x'.^\ is the length of the negotiation thread between agent a and agent i. 



3.2.2 Resource-Estimation Tactics 

Resource estimation tactics measure the quantity of a resource at a time t. Function a 
can be used to model this: 
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The function resource is here used to evaluate the amount of available resources at 
time t. The following examples model 

• interested parties: resource{t) = (t)| 

I fl |2 

a 

• interested parties and negotiation threads’ length: resource(t) = 

. 4 a 

• time: resource(t) = max(0, 



3.3 Behaviour-Dependent Tactics 

Behaviour-dependent tactics try to imitate the behaviour of the agent's opponents up 
to a certain extent. This can be useful once opponents will not be able to exploit the 
agent’s strategy. Tactics of this family make counter-proposals influenced by the 
opponent's former actions. Following there are three different ways of using imitating 
behaviours, assuming the negotiation thread 
and S>1. 



3.3.1 Relative Tit-for-Tat 

These tactics imitate proportionally an opponent's behaviour S > 1 steps ago. The 
length of the negotiation thread must be n > 2 <5 The generated counter-proposal is: 



^n+l 
a b 
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The counter-proposal generated is calculated with the last proposal ( x^ ^ 

2the proportional evolution of two consecutive proposals from the opponent 
(,'» 2 lil and 4" 2 

3.3.2 Random Absolute Tit-for-Tat 

These tactics imitate in absolute terms the opponent’s behaviour. They require the 
existence of a negotiation thread length of n > 2<S The resulting counter-proposal is: 



= min(max(xj* ^^[j] 
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0, yjdecreaing 

where s = 

1, Vj increaing 

Parameter R(M) provides a way to overcome function’s local minima. The function 
R(M) returns a random integer in the space [0, M], where M is the threshold of 
imitative behaviour. 

3.3.3 Averaged Tit-for-Tat 

These tactics imitate proportionally by calculating the average evolution of a certain 
number of proposals to the last proposal. The parameter prefers to the number of past 
proposals that are considered. The counter-proposal obtained is: 

« 2 r, 

t X Ml? 

[j] = min(max(— ^ — x ” \ [j], min^ ), max*^ ) 

for n > 2 . The behaviour of averaged Tit-For-Tat when choosing y= 1 is similar to 
relative Tit-For-Tat with S= 1. 

3.4 User-Defined Strategies 

Once an agent is created and correctly initiated, it can be activated in order to contact 
the marketplace, thus starting a new episode in its life. Within an episode, the 
objective can not be changed. The episode ends when the agent is deactivated. 

As defined in section 2.1, a strategy can be realised as the way weighted 
combinations of the former exposed tactics are selected. An agent’s strategy 
determines which combination of tactics should be used at any particular instant 
within an episode. The simplest strategy is to use the same combination at any time in 
that episode. 

The SMACE U serinterface allows the user to create a MTA, which combines three 
different tactics - one from each family described before. The sub-tactics are selected 
by fixing the parameters in the corresponding tactic family. Along with these 
parameters, the weight combination remains the same, unless the user explicitly 
provides different values for it. This means that a fixed weighted combination of 
tactics is always in use. The user can interact with the Userinterface to observe agent 
actions and change its parameters any time he wants, implementing by this way his 
own strategy. 

In order to make it possible to design and include new agent’s learning capabilities 
or for the user to reuse a successful agent, MarketAgents do not terminate after an 
episode. They can be reactivated again with possibly different objectives and, if 
desired, with different issues and tactics parameters. 
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3.5 Adaptive Behaviour Based Strategies 

There are several approaches how adaptive behaviour could be achieved in the 
environment described. First let us precise the expression “adaptive behaviour”. 
SMACE provides a dynamic environment, i.e., various unknown agents are able to 
meet and negotiate for an interval of time and achieve some result by following a 
certain strategy. The strategy of an MTA is simple: it uses the weighted combination 
of tactics that the user supplies. Anyway it should not be necessary to supervise 
autonomous agents all the time. Once a user has specified the parameters for his 
agent’s tactics and also specified the combination of tactics by adjusting the 
respective relative weights, the agent will behave accordingly regardless the situation 
that it is facing at each particular movement. The tactics provide a means to adapt, in 
a certain range, to different situations considering certain resources as described in 
previous paragraphs. The initial proposal also plays an important role in negotiation. 
Flow to choose this initial value could also be learned by experience. Using different 
weighted combinations of tactics along the time in order to match the optimal one in 
each situation could enable an agent to have an adaptive behaviour and making better 
deals. However it can not be evaluated which combination of tactics ensures the most 
success. The space of possible combinations of interrelated variables and situations is 
indeed too large. 

Using the MarketAgent template from the SMACE API described in section 2.3, a 
learning algorithm was included in a new agent - AdaptiveBehaviourAgent (ABA) - to 
provide a way to find out what is the best weighted combination of tactics in any 
situation for any issue. In the following we are referring to the approach of a matrix 
with weights per tactic and issue for a proposal [2]. It is obvious that changing the 
weights that are responsible for the importance of each tactic, depending on 
situations, leads to an adaptive mechanism that provides an agent with more 
autonomy to react appropriately depending on both his sensor input and mental state. 
A matrix of these weights per issue for a proposal from agent a to agent b at time t 
looks like the following: 



21 22--- 2m 



p\ p\"' pm 



where O^j is issue fs weight of tactic j. 

Erom the tactics explained before, we may configure many combinations of 10 
different tactics per issue per proposal, only by varying the following parameters: 

• Time-dependent function: polynomial, exponential 

• Resource-dependent tactic: dynamic deadline, resource-estimation 

• resource: agents, agents/negotiation threads, time 

• Behaviour-dependent tactic: relative Tit-For-Tat, random absolute Tit-For-Tat, 

averaged Tit-For-Tat 
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Let there be a vector of 10 weights for these tactics for an issue j: X- = (Xj, x^, x,J. 
Provided that each [0.0, 0.1, 0.2,. ..,1.0] there are 92378 different 

weighted combinations of the 10 tactics, per issue, considered that the other 
parameters (First proposal constant. Convexity degree, Negotiation time with single 
agent. Step and Window Size) are fixed. However, applying the restrictions of an 
MTA that is allowed to combine only 3 tactics, under the same constraints, for the 
ABA it turns out that there are only 66 different weighted combinations left. That 
seems to be an acceptable amount of tactic combinations to analyse, in order to find 
out the best one in each situation. The choice to specify the other tactic’s parameters 
is left to the user as well as the selection of the 3 tactics, to be applied separately per 
issue. Then, it is up to the adaptive mechanism to adjust the weights for the specified 
tactics. 

3.5.1 Applying a Reinforcement Learning Algorithm 

In order to search for the best weighted combination of tactics in each situation, 
Reinforcement Learning seems to be promising as it learns online. In contrast, 
Supervised Learning is learning offline from examples, provided in a training set. It is 
not adequate to our domain, since in interacting problems - like negotiating in a 
dynamic environment - it is often impractical to obtain representative examples of 
desired behaviours. RL enables agents to learn from their own experiences. A kind of 
RL, Q-learning [6], selects an action in a certain state depending on the ranking of that 
state/action pair. In our case actions are vectors of weights (weighted combinations of 
tactics). These are referred to as actions in the rest of this paper. The Q-learning is 
guaranteed to converge to the optimal combinations of state/action pairs after each 
action has been tried sufficiently often. That seems to be feasible with 66 possible 
actions per issue. The ranking is due to rewards by matching actions to certain states. 
Our adaptive agent using RL algorithm Q-learning in the SMACE framework 
executes roughly the following steps per issue: 

• determines the current state per issue 

• chooses a weighted combination of tactics by selecting one from the available 66 

• uses this weighted combination of tactics for the next proposal 

• observes the resulting rewards 

It is obvious that the success of this process mainly depends on how the states are 
characterised. We therefore select some general settings that are supposed to stay 
unchanged during an episode and some others to identify states within an episode. We 
have assumed that what is important to learn is the optimal tactic weights for an 
episode’s static settings. These are the chosen tactics with their parameters, the 
intention (sell, buy) and the negotiation issue that all states within an episode have in 
common. This information is referred to as the FrameConditions. To distinguish 
states within an episode we then chose the number of trading partners 
(QuantityOfActiveThreads)', the percentage of time left to the deadline 
(PercentageOfTimeLefty, if negotiating is actually established, the agent is waiting for 
opponents, it has already reached the deadline or made a deal (StateFlag). Thus a 
state can be represented as follows: 
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StateFlag 

Of f . QuantityOfActiveThreads 
^ ^ ■ PercentageOjTimeLeft 
FrameConditions 



Issue 

Intention 

FrameConditions TimeDependentTacticParameters 

ResourceDependentTacticParameters 

BehaviourDependentTacticParameters 



Now, the rewarding mechanism is chosen as follows. While making transitions 
between states in which the agent is negotiating the reward is always zero for each 
issue’s weighted combination of tactics. That is neutral respectively to the user- 
specified conceder or boulware behaviour. Giving a negative reward would force the 
agent to need fewer proposals to reach a deal. But that would be unlike a 
predominantly boulware behaviour. Changing to a state that is indicated as a deal 
state, i.e., where the agent got an agreement, is rewarded with the agent’s own utility 
per issue. The circumstances of the deal, i.e., the values on which the agents agreed, 
influence the rewards. Different rewards honour those actions that increase the utility 
of an agent for that issue. Communication costs are here not considered. To calculate 
the utility per issue, the scoring function from each negotiable issue is used. In case of 
an agreement at the reservation value the utility is zero. The maximum utility is 
reached for the best possible score of the issue. Loosing a deal because of any reason 
indicates a bad weighted combination of tactics. Thus making a transition from a 
negotiating state to one where either the deadline was reached or the opponent 
decided to stop negotiating is punished, that is, rewarded with a negative value. This 
allows distinguishing this situation from deals at reservation values. The penalty 
mechanism also considers that unexplored actions might be better than those that have 
been punished. 

The Q-learning chooses, for each issue, the highest scored weighted combination 
of tactics. However, as the environment is dynamic, the same action may not lead to a 
desired result when applied to the same state. As a trade-off between exploitation of 
already considered good actions and exploration of yet unknown ones (or considered 
in the past inferior ones), Q-learning selects with a certain probability a non-greedy 
action. This can be achieved, for instance, with a action selection mechanism that uses 
either 

1. a small probability f of choosing uniformly a non-greedy action (Epsilon-Greedy 

[6]) or 

2. a given degree of exploration Tfor choosing between non-greedy actions, while 

considering their ranking (this is called the Softmax [6] approach). 



4 Experimental Scenarios 

In a dynamic environment as the one provided by SMACE there could be an infinite 
number of scenarios, depending on the existing market agents and their objectives and 
negotiation settings. In this section we describe some situations that can prevent 
negotiation from being started or deals from being made. We provide a fixed scenario 
that avoids those situations in order to successfully test our market agents. 
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4.1 Problematic Scenarios 

As all the agent’s negotiation settings are private, agents start negotiations without 
knowing if a deal is possible [2], This private information includes their negotiation 
tactics, as well as their negotiation issues parameters - weights, value ranges and 
scoring functions. 

One problematic scenario involves agents with non-matching value ranges. Since 
proposals outside the range of acceptable values are always rejected, a deal in this 
situation is impossible, but as the negotiation process proceeds, communication costs 
are unnecessarily caused. Even when the value ranges overlap, it is not certain that the 
result of a negotiation will be a deal. That depends on a number of factors related with 
the agents’ proposal generation process, including the first proposal value. This first 
value may limit the range of proposed values to a sub-range of the acceptable values. 
Moreover, since there are no partial agreements, proposal values, that are already 
acceptable, could run out of its value range while other issues under negotiation 
converge to an acceptable value. Acceptable values still do not lead to a deal if the 
agent is looking for better scored values. 

4.2 Scenarios Features 

In this subsection we describe some assumptions related to the scenario where the 
first evaluations of our market agents are done. 

First, since we are interested on effective negotiation testing, we provide all the 
agents with the same service offered/requested. This, together with appropriate 
deadlines, ensures that they will start negotiating. Of course, we provide the 
marketplace with buyer and seller agents. 

Following the considerations discussed in section 2.3.2, all the agents will 
negotiate, for simplicity, over the single issue price. Since the ABAs learn 
independently per issue, this choice does not affect the results. 

In order to avoid the problem discussed in the previous subsection, we state that 
deals are always possible, that is, agents’ value ranges match totally or are subsets. 

We assume that agents with opposite intentions have concurrent scoring functions 
(as mentioned in section 2.1). 

To evaluate the possible success of the ABA model over the MTA model, we have 
chosen two different possible agent configurations. In these, MTA agents will use the 
same tactic configuration in all episodes where they are used. 

• Scenario T. one ABA trading with one MTA (price value range of [1, 10] for both). 

• We expected the ABA to increase its utility after a number of episodes, so we 
could confirm the ABA’s adaptation process. 

• Scenario 2: one ABA {[3, 10]) offering a service to one MTA {]!, 10]) and 

competing with another MTA (]1, 5]). The value ranges were configured like that 
to make it likely for the MTAs to agree faster than the ABA, in order to stimulate 
the ABA’s adaptation process. 

— > We expected the ABA to win deals with its best possible utility, after a number 
of episodes. 

For all MTAs, the tactic configuration was set to a single polynomial time- 
dependent tactic, with a convexity degree parameter of one { j3 = 1) and a first 
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proposal constant of zero {k = 0), in order to use the whole value range. This 
configuration represents the simplest case of a MTA for the given value range, 
because, when generating proposals, it only considers its time left. 

The ABAs always used a combination of three tactics, all from the time-dependent 
family. The tactics’ settings were similar to that used in the MTAs, except that 
different convexity degree parameters were supplied, in order to enable three different 
behaviours: boulware (J3 = 0.01), linear (/? = 7) and conceder {j3= 100). We chose a 
combination of three time-dependent tactics to make it easier to provide those three 
behaviours. These are needed so that the ABAs are able to increase their utilities, by 
trying different weighted combinations inside that behaviour range. 

Furthermore, in order to reduce the ABAs’ learning state/action space, we reduced 
the number of possible actions, by limiting the available weights. 

To consider the trade-off between exploration and exploitation, actions were 
selected by using the Softmax algorithm and a degree of exploration T = 0.1. The 
parameters for the Q-learning algorithm were set to 0. 1 for the learning rate and 0.9 
for the discount factor. 

5 Conclusions 

First experimental results suggest that the ABA’s learning capabilities were sufficient 
in order to beat its competitor (scenario 2). We observed on a 150 episodes 
experiment that the rough tendency was for the ABA to significantly increase the 
number of deals won over its competitor. However, the ABA was not successful in 
improving its utility on the deals it made. We believe this is due to the rewarding 
mechanism applied, which ranks the fact of getting a deal higher than the distinction 
between deal utilities. Furthermore, the exploration rate was not sufficient for the 
agent to try different tactic combinations in the episodes tested. The same difficulty 
appeared in scenario 1 , where the ABA agent did not tend to increase the utility of its 
deals. This was also due to the fact that the use of conceder tactic combinations was 
preferred, since they lead faster to a deal and so their Q-values are increased in early 
stages of the adaptation process. 

Further experiments are needed to reason the observations described above. Also, 
by limiting to time-dependent tactics, we introduced a limitation to the ABA’s 
behaviour that may have prevented it from a better performance. Learning how to use 
the convexity degree parameter {/}) might prove to be a more efficient way of learning 
how to use a time-dependent behaviour, since it can make it easier to reduce the 
state/action space involved. 



6 Future Work 

In the present paper, we described negotiation behaviours using independent weighted 
combinations of tactics for each one of the negotiation issues. We intend to further 
investigate on weighted combinations of tactics that consider correlation between 
those issues. We believe that agents might benefit from calculating several different 
issue values, for a proposal, that influence each other. 
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Another aspect of future work relates to the exploration of other features of the Q- 
learning, as well as to the application and comparison of different learning algorithms 
(other kinds of Reinforcement Learning and Genetic Algorithms) that may perform 
better in a dynamic market environment. We are also interested in testing agents that 
become somehow specialized in a specific tactic, by optimizing only the parameter 
values of that tactic to be used in a specific situation. 



References 



1. A. Chavez and P. Maes. Kasbah: An Agent Marketplace for Buying and Selling Goods. 
Proceedings of the First International Conference on the Practical Application of Intelligent 
Agents and Multi-Agent Technology (PAAM’96). London, UK, April 1996. 

2. P. Faratin, C. Sierra, and N. R. Jennings: Negotiation Decision Functions for Autonomous 
Agents. Int. Journal of Robotics and Autonomous Systems 24 (3-4) 159-182. 1998. 

3. R. Guttman and P. Maes: Agent-mediated Integrative Negotiation for Retail Electronic 
Commerce. Proceedings of the Workshop on Agent Mediated Electronic Trading 
(AMET'98). May 1998. 

4. R. Guttman, A. Moukas, and P. Maes: Agent-mediated Electronic Commerce: A Survey. 
Knowledge Engineering Review, June 1998. 

5. H. Raiffa: The Art and Science of Negotiation. Harvard University Press, Cambridge, USA, 
1982. 

6. R. Sutton, and A. Barto: Reinforcement Learning: An Introduction. MIT Press, Cambridge, 
MA, 1998. 

7. PR Wurman, MP Wellman, and WE Walsh: The Michigan Internet AuctionBot: A 
configurable auction server for human and software agents. Second International 
Conference on Autonomous Agents, May 1998. 

8. JATLite (Java Agent Template, Lite), http://java.stanford.edu 

9. Java(tm) Technology Home Page, http://www.javasoft.com 

10. UMBC KQML Web. http://www.csee.umbc.edu/kqml/ 




Choice and Institutions in Agent Societies 



Jose Castro Caldas' and Helder Coelho^ 

'ISCTE, Av. das Forfas Armadas, 

1600 Lisbon, Portugal 
jmcc@iscte .pt 

^Faculdade de Ciencias, Universidade de Lisboa, Bloco C5, Piso 1, Campo Grande 

1700 Lisbon, Portugal 
hcoelho@di .fc.ul.pt 



Abstract. In large anonymous groups, collective action may not be sustained 
without an ‘external’ monitoring meta-agency. But, if interactions are not 
anonymous, a distributed meta-agency may sustain collective action, provided 
that, within the group, a sufficient initial level of compliance with the shared 
rules exists. Given an uneven distribution of power, shared rules that are not 
beneficial to the group may persist. We show through simulation that these are 
results that may be obtained from a simple model of bounded rational choice 
where the agent’s action interests are balanced against higher order normative 
motives. 



1. Introduction 



For a long time the relation of institutions^ to individual behaviour and the role of 
institutions in the shaping of the economic order has not been dully taken into 
consideration. Recently, however, the topic has re-emerged giving rise to a vast 
debate and to a movement for an Institutional Economics, that permeates different 
traditions and economic paradigms [10, 12, 13, 18, 24]. The primary aim of this paper 
is to contribute for the discussion of this topic, from a perspective that combines 
ideas, methods and formalisms of Economics and AI [3]. We discuss the bounded 
rational choice [19, 20] faced by agents living in a society and we use the Genetic 
Algorithm (GA) to model group behaviour. Since a striking parallelism exists 
between the problems that Institutional Economics and Distributed AI (DAI) are 
faced with, researchers in this field will easily recognise the issues we deal with, and 
we hope that they may feel motivated to join in the debate. 



' Institutions have been defined by economists in different ways: “human devised constraints 
that shape human interaction” (Oliver North); “a way of thought or action of some 
prevalence and permanence, which is embedded in the habits of a group, or the customs of a 
people” (Walton Hamilton). 
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2. The Environmental Setting 

This paper deals mainly with a situation of anonymous (or system mediated) 
interaction with strong external effects - that is, a situation in which: (a) the 
aggregated outcomes are determined by the actions of a large number of agents; (b) 
the aggregated outcomes determine the agents’ individual rewards; (c) the agents are 
unable to communicate; and, (d) the individual rewards may not be determined by the 
individual contribution to the collective payoff. The collective action problem 
attached has the interesting features of a game theoretic N-person repeated Prisoner’s 
Dilemma (PD), and it is obviously relevant not only for Economics. 

Trying to keep things simple we will consider the following situation stated as an 
economic experiment: “A set of individuals, kept in isolation from each other, must 
post a contribution (from $0 to a pre-defined maximum) in an envelope, announcing 
the amount contained in it; the posted contributions are collected, summed up by the 
experimenter and ‘invested’, giving rise to a collective payoff that must be 
apportioned among the individuals; the apportioning rule instituted (known to the 
agents) stipulates that the share of the collective payoff must be proportional to the 
announced contributions (not to the posted contributions); the posted contributions 
and the corresponding announced contributions are subsequently made public (but not 
attributed to individuals); individual returns on investment are put by the 
experimenter into the corresponding envelopes and the envelopes are claimed by their 
owners.” 

Can we predict what is likely to happen after a number of repetitions of this 
experiment with the same subjects? Ledyard [16] answers: 

There are many theories. One, the economic/game-theoretical prediction, is that no 
one will ever contribute anything. Each potential contributor will try to “free-ride” on 
the others. (...) Another theory, which I will call the sociological-psychological 
prediction, is that each subject will contribute something (...) it is some times claimed 
that altruism, social norms or group identification will lead each to contribute (...x...), 
the group optimal outcome. (...) Examination of the data reveals that neither theory is 
right. 

As a matter of fact, the experimental evidence in similar cases shows that, with 
large groups, positive posted contributions are observable in the first rounds, but free- 
riding soon emerges leading the group to levels of contribution that all agents 
consider undesirable. The question therefore is: What might be wrong with the 
‘economic/game-theoretical’ and with the ‘sociological-psychological’ models? 



3. The Unidimentional Mind 

The human being, contrary to the economic man, is neither omniscient nor a super- 
computer [20]. However, apart from unbounded rationality, there might be an even 
more fundamental problem with the economic model of man. Long ago, Edgeworth 
[8], wrote: “The first principle of Economics is that every agent is actuated solely by 
self-interest”. This is still the principle in which the game theoretical/economic 
standard model is founded. The problem is that this ‘first principle’, that may seem 
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crystal clear at first sight, soon becomes confusing as the question of what might be 
considered to be the interest of the agent is posed: his own well being? the well being 
of his family, of his neighbours, of his country? In fact, if the scope of self-interest is 
indefinitely extended, any act, as ‘altruistic’ as it may seem, can be interpreted as self- 
interested (or even egoistic). A game of words that leads nowhere follows, turning 
Edgeworth’s sentence into a mere tautology. Jevons [15], however, had clarified 
exactly what the marginalists (including Edgworth) had in mind: 

As it seems to me, the feelings of which a man is capable are of various grades. He is 
always subject to mere physical pleasure or pain (...). He is capable also of mental and 
moral feelings of several degrees of elevation. A higher motive may rightly 
overbalance all considerations belonging even to the next lower range of feelings; but 
so long as the higher motive does not intervene, it is surely both desirable and right 
that the lower motives should be balanced against each other (...). Motives and 
feelings are certainly of the same kind to the extent that we are able to weigh them 
against each other; but they are, nevertheless, almost incomparable in power and 
authority. 

My present purpose is accomplished in pointing out this hierarchy of feeling, and 
assigning a proper place to the pleasures and pains with which the Economist deals. It 
is the lowest rank of feeling which we here treat. (...) Each labourer, in the absence of 
other motives, is supposed to devote his energy to the accumulation of wealth. A 
higher calculus of moral right and wrong would be needed to show how he may best 
employ that wealth for the good of others as well as himself. But when that higher 
calculus gives no prohibition, we need the lower calculus to gain us the utmost good 
in matters of moral indifference. 

This long quotation couldn’t be avoided because it makes two points very clearly: 
(a) Economics was supposed to deal with the “lowest rank of feeling” under the 
assumption of the absence of motives arising from any “higher ranks”; (b) The 
‘feelings’ pertaining to different levels are incommensurable, they are “almost 
incomparable in power and authority”. These two points have important implications. 
Eirst: Since the absence of motives arising from any “higher rank” can only make 
sense in a situation of interaction where actions that have positive consequences for 
an agent do not affect all the other, the other situations (including therefore a large 
section of the subject matter of game theory) are out of the scope of the economic 
man model - when external effects are present, there are “no matters of moral 
indifference”. Second: in no way the ‘hierarchy of feelings’ can be aggregated in a 
single (context independent) utility function. 

Therefore, Edgeworth’s unidimentional mind concept of self-interested action does 
not fit in environments other than the typical economic situation of anonymous 
interaction with productivity related rewards. Concerning the experimental evidence, 
the standard economic/game theoretic approach must be unable to explain why 
positive contributions are observed in the first rounds of experiments: if I am self- 
interested, in the sense that I disregard the higher order obligation of contributing to 
collective goals and the prohibition of not telling the truth, and if I know that all the 
others disregard it in the same way, why should I be contributive and truthful, bearing 
alone the costs and having a benefit that is disproportional to my contribution? 

But the unidimentional mind is also present, although differently, in alternative 
accounts of human action. In what might be called a standard funcionalist sociologic 
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explanation for the fact that individuals tend to behave in accordance with social 
norms, the emphasis would be on socialisation, “a process in which, through (positive 
and negative) sanctions imposed by their social environment, individuals come to 
abide by norms” [23], and that leads to internalisation, “according to which a 
person’s willingness to abide by norms becomes independent of external sanctions 
and, instead, becomes part of a person’s character” [23]. In reference to the 
experimental evidence, internalisation might, explain the positive contribution in the 
first rounds. But would it would explain the breakdown of the posted contributions 
observed with repetition? 



4. The ‘Hierarchy of Feeling’ 

If Economics is to deal with interactions within society where strong external effects 
are present, the model of man may have to be reconsidered. We are no longer dealing 
with “the lowest rank of feeling”. The agents may be endowed with a moral 
disposition [23] that drives them to behave in accordance with the rules that are 
believed to sustain the group existence. However, even if we accept that a moral 
disposition has to be taken into account, this propensity can not be taken like a fixed 
parameter in the agent’s model. The moral disposition is not imprinted once for all, 
and into all agent minds with the same degree, by genetics or culture. Since a shared 
rule can only produce the expected benefits if it is generally abided by, it may become 
pointless not to violate it when most others do. The moral disposition is therefore a 
variable in two senses: it varies from individual to individual, and it tends to be 
strengthened with rule compliance and weakened with the spreading of deviant 
behaviour. 

Two ideas are combined here: (a) The existence of a ‘hierarchy of felling’ 
encompassing normative obligations; (b) The dependence of the moral disposition on 
the level of compliance with shared rules within the group. The first idea, has been 
developed by different authors. Margolis [17] presented a model with individuals 
endowed with two utility functions: purely individual S preferences and purely social 
G preferences. Buchanan and Vanberg [23] speak of a distinction between action 
interests and constitutional interests. The action interest concerns personal situational 
choices within a set of alternatives, the constitutional interest is related to shared rules 
and might be defined as the individual’s interest “in seeing a certain rule implemented 
in a social community within which he operates” [23]. The second idea of a relation 
between the moral force of shared rules and the degree of compliance within the 
group was developed by Akerlof [1] and by Sugden [22]. In DAI literature similar 
concepts can be found in [6]. These two ideas together with a particular concept of 
bounded rationality are at the core of the model of an agent that will be sketched and 
highlighted in the following. 
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5. Bounded Rationality in a Social Context 



Herbert Simon’s critique of the rational choice paradigm and his own concept of 
hounded rationality had a huge impact in economics and classical AI. However, 
Simon’s scenario in his seminal papers [20, 21] was one of a single agent in 
interaction with the world, or at best two agents over a chessboard (typical closed 
world assumption). The relevant aspects of decision making in social settings were 
not taken into account. It is useful to revisit Simon’s critique and model placing now 
the decision making agent within a social environment. 

In a formal way, the environmental setting that we are interested in may be stated 
as follows: n agents live in a world where the system’s state Y is determined by the set 
A = (a,, of actions of the individuals in some population. The function that 

maps A into Y may be unknown to the agents and it may change in time, but given a 
state y of the system and the corresponding set A of actions, every agent can assign 
credit to any action in A using a inunction f. that models the current state of his 
preferences and that maps A into a set of evaluations 5’=(^’^, s\). The agents 

must recursively pick up an action from the set A of all feasible actions in a discrete 
sequence of time periods q, q,...,q. 

The value of an action to any agent in this world is context dependent. It depends 
on the agent preferences, on the function that maps ^4 into Y, and on the actions 
performed by all the other agents. His situation can be described as one of radical 
uncertainty: “Future events cannot be associated with prohahility distributions based 
on knowledge of the past” [7]. This uncertainty may arise, from the behaviour of the 
other agents and from the aggregated behaviour of the system. 

Simon’s agents had limited perception, knowledge and computational capability. 
Their choices were guided by evaluations of the expected consequences of actions, 
but they could neither perceive the whole range of admissible actions, nor perfectly 
compute the consequences of each of them. In placing Simon’s agent in our social 
setting, a way of modelling limited perception is to assume that the actions observed 
in the present (the set A) are somehow salient to each individual. He will base his 
choice on the evaluation of actions belonging to this set. This neither implies that he 
must forget all the other actions that were observed in the past, nor that he is unable to 
perceive actions that were never observed, it is simply a consequence of imperfect 
knowledge. 

Let us have agent i in time period t deciding what to do in time t+ 1 . The simplest 
way of modelling the decision procedure of such an agent taking into account the 
preceding considerations is, possibly, the following one. 

For the reasons above given we assume that the set A , is the one evaluated by the 
agents . Each individual has two alternatives when trying to reach a decision: 

(a) to choose an action that ‘looks promising’ from the set of observed actions 

In this case given the set S\ the agent selects an action using a lottery where the 
probability of selection is somehow proportional to the credit assigned to each action 
in A^, and the frequency of that action in the population; 

(b) to choose an action in A not included in A , in order to test it in t+1 . In a world of 
limited knowledge and information there are reasons to be innovative. Opportunities 
may be hidden by the fog of uncertainty. This innovative move may be modelled in 
two ways: the agent may randomly modify the selected action in (a), or he may try the 
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abductive move of recombining the selected action with other actions selected by the 
same procedure. 

Behind such actions we are assuming particular rules as symbolic representations 
of those actions. Since in a given culture an agent is usually able to decode an action 
into underlying rules, the operations above described for the space of actions may also 
he viewed as operations in a space of rules. We also assume that the agents are 
informed of the actions performed by other agents in a given time period, that they are 
able to assign credit to these actions (even though some of these were not directly 
experienced hy them), and that they are able to code an observed action back into a 
‘program’ . 



6. The Genetic Algorithm as a Model of Socio-Economic Processes 



When searching for an appropriate tool to model socio-economic processes, the 
Genetic Algorithm (GA) [14] comes up, together with other evolutionary algorithms, 
as a natural candidate. The appealing feature of the GA is that it may have 
behaviourally meaningful socio-economic interpretations. Arifovic [2] mentioned two 
alternative interpretations, referred by Chattoe [5] as a mental interpretation, and as a 
population interpretation. They may be presented as follows: (a) In the mental 
interpretation, the population represents a single mind, i.e. each chromosome in the 
population represents a rule: “the frequency with which a given rule is represented in 
the population indicates the degree of credence attached to it” [2]; (b) In the 
population interpretation, the population represents the active rule of each agent; the 
frequency of a given rule in the population indicates “the degree to which it is 
accepted in a population of agents” [2]. 

Our interpretation of the GA in this paper differs from Arifovic’ s in some specific 
points^ It implements the model of bounded rational choice above sketched, and with 
minor modifications to the simple GA versions, it combines elements of the mental 
and the population interpretations. The GA population represents a collection of sets 
of rules (even though each set of rules may correspond to an individual) associated 
with the set A of actions defined in section 5; the fitness function is an individual 
credit assigning function (not a system level function that determines the ‘global’ 
quality of the decision rules), and each agent is endowed with one that may be 
idiosyncratic; the selection operator implements the choice of one action from the set 
A; the mutation operator corresponds to one type of innovative move; the crossover 
operator corresponds to the abductive recombination of rules (the crossover of a. and 
a leads to the rule set a’, to be tested in t+\, possibly after having been subject to 
mutation); depending on the innovative propensity of each agent, the parameters - 
probability of mutation and probability of crossover - may vary accordingly. 

In the simulation models that follow the GA is implemented in its simple version: 
the rule sets are represented in fixed-length bit strings; biased roulette selection, one- 
point crossover and bit mutation are used as operators; the size of the population is 



^Arifovic’s use of the GA has been discussed by Chattoe (1998). 
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kept constant. Given the model of the decision process already sketched in section 5, 
this GA may be summarised as in figure 1 . 
begin 

t 0 

randomly generate a population P(t) 
determine aggregate results for P(t) 
while not (stopping criteria) do 
begin 

for every agent j do 
begin 

assign credit to rule sets in P(t) 
select a rule set a^i from P(t) 
if r < prob_crosSj then 
begin 

select a rule set aj 2 from P(t) 
crossover (a^i, a^^) to a'ji 
end 

else a'ji= a^i 
for every bit in a'ji 
if r< prob_mutj then mutate that bit 

end 

determine aggregate results for P(t) 
t t+1 
end 

end 



Fig. 1. The GA procedur^ 



7. Simulating the Experimental Situation 

Let us recall again the experimental situation described in section 2. In each period of 
time an agent must decide on his actual contribution and on his announced 
contribution. The shared rules: ‘thou shall not lie’ and ‘thou shall contribute to the 
collective goal’ are implicit. With this in mind, the rule set attached to each agent is 
implemented by coding one part of the 0/1 string (chromosome) as announced 
contribution and the other part as moral disposition', the announced contribution part 
of the string will decode into a real number between 0 and 50, and the moral 
disposition part to a real number between 0 and 1 



^ P(0 stands for the population in generation t, r is a uniformly distributed random number 
between 0 and 1 ; prob_cross^ and prob_mut^ , are the parameters that set the probability of 
crossover of a selected chromosome and the probability of mutation of each single bit for 
agent/. 

* In all the simulations that follow, the Population Size is 20, the Probability of Crossover is 0.5 
and the Probability of Mutation is 0.01, for all agents. 
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The posted contribution of agent i is: 



posted contribution- = announced contribution- moral disposition^ 

the collective return on investment is given by: 

collective return =10 posted contribution^ 
i 

the apportioning rule by: 

announced contributioni 

returni = collective return 

announced contributioni 
i 

the credit of action j to agent i is assigned according to, 

credit; = return : posted contribution ■ 

‘ 7 J 

and the collective payoff is given by: 

collective payojf = returni posted contribution- 
i i 



(Al) 

(A2) 

(A3) 

(A4) 

(AS) 




Generations 



total posted contributions + total announced contributions collective payoff 



Fig. 2. Contributions and collective payoffs through the simulation 

The results of a typicaj^ simulation are shown in figure 1. Until the announced 
contributions reach their maximum level, the posted contributions increase as well, 
and after this they rapidly tend to zero, while the announced contributions are kept 
close to the maximum value. Due to the existent incentive to free-ride, the initial 
moral disposition tends to erode with time. As a result, the collective payoff 
deteriorates reaching the zero level around generation 120. After this, only occasional 
mutations (that might be interpreted as signalling intentions to contribute conditional 
to the contribution of others) disturb the scenario of collective disaster. 



^Different initial populations were generated using various random generator seeds. The 
observed overall pattern of outcome is common to all. 
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The results are therefore consistent with the available experimental evidence 
mentioned in section 2: positive contributions are observed in the first rounds of the 
experiment but free-riding tends to emerge leading the group to very low levels of 
contribution. In spite of the initial moral disposition (which is unevenly distributed in 
the population), the free-riding behaviour is rewarded and eventually invades the 
population of rules. No viable social order would be possible in this context, and the 
group would simply perish. But lets not rush into easy conclusions. After all, in the 
real world, collective action exists. 



8. Solutions to the Problem of Collective Action 

Game theorists and other social researchers have invested a huge effort in trying to 
show that collective action might spontaneously come to existence and be reproduced 
without the enforcement of social norms by a coercive agency of some kind. In some 
contexts, related to co-ordination, their arguments are rather convincing. Our 
example, however, leads to a more pessimistic conclusion: In a situation with a 
N-person repeated PD structure the invisible hand would lead the group to disaster 
even if a conditional moral disposition was assumed. We refrained from generalising 
this result. Game theorists after all showed that even self-interested individuals might 
co-operate if the PD was indefinitely repeated. However, this result would be much 
harder to obtain for N-person games. The possibility of an anarchic order remains 
therefore open for speculation. It is not completely ruled out, whether we approach it 
in game theoretic terms, or from the historic and anthropological record, but it is far 
from having been proven convincingly. Meanwhile, we turn to the Hobbesian solution 
that is much more familiar to us. 

Hobbes’s argument on the need for a social contract and an enforcing sovereign 
power has been translated to modern terminology by game theorists: “the Hobbesian 
argument essentially turns on the claim that the problem of political obligation can 
only be solved by the creation of a co-operative political game, instead of the non-co- 
operative game played in the state of nature” [11]. Co-operative games are based on 
pre-play negotiation and binding agreements. In these terms the social contract would 
be the result of pre-play negotiation and the presence of Hobbes’s sovereign the 
condition to make this contract binding to all. 

We will extend our simulation model introducing a meta-agent with monitoring 
and sanctioning powers, while we continue to take as given the remaining institutional 
frame. Our only present aim is to test how the system would behave once the 
monitoring meta-agent is introduced by changing the setting of the experiment 
described in section 2: the experimenter (the meta-agent) may now decide to open 
some or (all) of the envelopes when they are handed to him; if an agent is found to 
have announced an amount that does not correspond to his contribution he will be 
sanctioned; his return on investment will now be determined by the following rule, 

posted contribution- (A3’) 

return; = ; ^ collective return moral disposition- 

‘ posted contribution^ ' 

i 

with the implicit penalty reverting to the meta-agent and included in the collective 
payoff. 
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The meta-agent chooses the individuals to be inspected by a simple rule: if r (a 
random real between 0 and 1) is lower than probability of monitoring (a parameter of 
the simulation) then agent i’s envelope will be opened. 

The results of the simulation, with probability of monitoring set to 1 (all agents 
inspected) (see figure 3) show that by generation 20 the maximum value for the 
posted and announced contributions is reached and kept thereafter with some 
fluctuations. This means that the selective pressures exerted by the monitoring meta- 
agent successfully counteract free-riding and induce high levels of moral disposition. 

The meta-agency may be viewed as an agency that is external to the individual 
members of the group, or as a distributed agency that includes individuals in the role 
of defenders of shared rules [6]. Both modes of meta-agency are not equivalent. With 
an anonymous interaction and no communication between agents as in the present 
environmental setting it makes no sense to consider a distributed meta-agency. But if 
the interaction is not anonymous, if the actions may be attributed to particular 
individuals, the agents may be led to consider, in their choices, a cost of transgression 
[6] related to reputation, self-esteem and other ‘emotional’ factors associated with 
acceptance by the group. In order to consider this kind of situation (relaxing for a 
moment the anonymity assumption), the experiment might be reformulated by 
choosing a group that is involved in daily interaction outside the laboratory, by 
making public each individual’s posted and announced contributions, and by letting 
the experimental subjects communicate with each other during the experiment. In this 
particular context, we might conjecture that the levels of the posted contributions 
would be much higher throughout the whole experiment. 




Fig. 3. Contributions, and collective payoffs through the simulation (probability of 

monitoring=\) 

In order to simulate this new situation, the meta-agent leaves the scene and the 
credit assignment function is reformulated in order to take into account the cost of 
transgression taken to be proportional to the scope of the transgression and the level 
of rule compliance within the group, such as: 
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credit I = returni posted contribution^ 
moral disposition- 

d 1 - moral disposition ■ return; 

n ' i I ‘ 



(A4’) 



The interesting feature of this simulation is that the outcome now becomes 
dependent on the initial average moral disposition. For a low initial average moral 
disposition (0.15), free-riding still becomes dominant. However, for a slightly higher 
level of the initial moral disposition (0.25), the rule compliant behaviour becomes 
self-reinforcing. The implication we may derive is that the ‘external’ meta-agency 
may be a precondition for collective action only when the interaction is 
predominantly anonymous, or when the initial propensity of the agents to act in 
accordance with the shared rules is too low. 



9. The Origin of Shared Rules 

The Hobbesian solution may be realistic, but is it necessarily beneficial for the group? 
Besides, it leaves open at least one important question: Where do the shared rules that 
are enforced by meta-agency come from? 

The easiest explanation for the origin of the shared rules, and the first one to be 
‘discovered’ by mankind, is that they were created in the mind of (and enacted by) 
some kind of real or virtual meta-agent. If this line of explanation is excluded, a 
second one [10] may be contemplated: they exist because after having spontaneously 
emerged, they became functional to the society or the group. This approach, however, 
is problematic: it involves the explanation of a cause (shared rules as the cause of 
stable behavioural patterns) by its effects (the beneficial effects to the group or to the 
society) [9]. It may easily be translated into the notion that all existing institutions are 
necessarily beneficial, and it leaves out the important case of the rules of legislation 
that are deliberately enacted to achieve a certain purpose. 

Alternative explanations for the origin of shared rules are built on the following set 
of premises: (a) Intelligent individuals are able to recognise and evaluate the 
aggregated effects of shared rules; (b) These intelligent agents may even formulate 
theories that enable them to predict the outcome of alternative sets of shared rules and 
modes of meta-agency, and formulate preferences over these outcomes [4]; (c) They 
may further engage in tacit or formal ‘agreements’ about institutional arrangements 
that by influencing individual choices, ensure the group’s existence. From this 
perspective, the institutional framework is as much a spontaneous and non-intended 
result of an historic process as it is a product of a continuous negotiation among 
agents with possibly conflicting constitutional interests. Convergence of constitutional 
interests is a possible outcome of negotiation, but it seems more interesting to 
consider the existence of groups where a constitutional framework prevails without 
what might be defined as a voluntary agreement. As a matter of fact, it is not difficult 
to accept that one may submit to an order even if one’s constitutional interests conflict 
with it. The negotiation over the constitutional order may then be seen as a sort of 
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game with temporary winners and losers, in which power counts. Morality can still be 
defined as a disposition to act in accordance with constitutional interests, but (since 
the agent’s constitutional interests may not converge) not as action in accordance with 
the prevailing shared rules. Institutional change may now be explained not only as 
result of constant adaptations in the individual minds, but as the outcome of changing 
balances of power. Rule enforcement, when domination enters into the picture, can no 
longer be thought of as a neutral prerogative of some meta-agent. The Hobbesian 
story becomes a little less naive. 

Some of these implications may be explored with further modifications of our 
simulation model. In the model that was presented in section 8, an apportioning rule 
was assumed: an unspecified deliberation process led to the enactment of that rule. 
We are now interested in modelling: (a) the process through which the constitutional 
preferences of the agents may change in result of their experience of the aggregated 
outcomes; (b) how the distribution of power within the group may be related with the 
constitutional design; and (c) how the constitutional regimes are related to the group 
welfare. 

The experimental setting and the model must be once again reformulated. Instead 
of announcing a contribution, the agent is now expected to abide by a minimum level 
of contribution (say 35). The meta-agent may decide to check if the envelope contains 
the minimum specified. If not, the agent will be sanctioned. In the model of the agent, 
a variable size is introduced representing the agent’s power, which, in the context of 
this model, is related to the greater or lesser weight of each agent in the decision 
process that leads to the adoption of an apportioning rule, and (depending on the 
apportioning rule) it may influence the size of each agent’s share of the collective 
benefits. The agent’s model includes also a rule for contribution and values assigned 
to two alternative apportioning rules that allow the agent to choose among them. The 
apportioning rule is now chosen by a voting procedure in which an agent’s size 
determines the weight of his vote. The value assigned by each agent to the 
constitutional rules is updated in every generation and is given by the agents average 
individual pay-off under each rule’s regime. The agent will vote on the rule that has 
greater value to him. The size of the agent is updated, assuming that in each 
generation a part of the individual payoff is ‘capitalised’ . 

The collective return on investment is given by equation A2 and the apportioning 
rules are given by: 

contribution I (Bl) 

Rule 1: return I = collective return 

contribution 

i 

sizei (B2) 

Rule 2: return i = collective return 

sizei 

i 

For non-monitored actions, under both regimes, the credit of action j to agent i is 
assigned by: 

credit i^ j = returni^ j contribution^ (B3) 
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and for monitored actions with contribution < 35 we have, 

creditij = returnij contributioni penalty^ (B4) 



with penalty -=lOx (35- contribution). 



The collective payoff is given by (A5), and the size of one agent is updated 
according to: 



returni f 
t + \= sizeij + ^ 



contributioni f 

100000 



penalty ■ 



(B5) 



The simulation includes a training period of 100 generations during which no 
voting takes place and that is used by the agents to explore the regimes of the two 
rules, assigning values to them. Rule 2 is experienced in the initial fifty generations 
and rule 1 along the next fifty. In generation 101, and every 20 generations after that, 
a vote takes place that may change the rule regime. 

Simulation 1 - All agents are created equal with size 10 and the probability of 
monitoring is set to 0.9. The results show that in the first fifty generations (under rule 
2) the total contributions and collective payoffs tend to decrease after the initial 
adjustment. When full monitoring is not possible, to apportion benefits in a way that 
is not proportional to contributions, leads to an inefficient outcome. After generation 
fifty (under rule 1), the contributions and payoffs start to recover reaching values that 
are close to the maximum amount. After generation 100, when voting starts, there is 
an unanimity on rule 1 that is kept till the end of the run. 

Simulation 2 - The size of each agent is now randomly generated varying between 
0 and 20. The results show that the comparatively bad performance of the first fifty 
generations (under rule 2) tends to improve under rule 1, between generations 50 and 
100. However in generation 101, when it comes to voting, rule 2 wins - rule 2 has a 
majority of votes even though it does not have a majority of voters. In point of fact, 
rule 2 performs well for large agents and badly for small ones - the correlation 
between the value of rule 2 and the size of the agent is almost perfect. After 
generation 100 (under rule 2) the overall pattern of the collective payoffs and 
contributions is inefficient and rather unstable. 

The implication suggested by this model is that, given an unbalanced power 
distribution and high levels of monitoring, a rule regime that is not beneficial to the 
group may exist and be reproduced in time. Power matters, and when it is unevenly 
distributed within a group, there seems to be a clear relation between an agent’s 
choice over the set of constitutional rules and his relative position within the group. 
Since an order based only on consent and unanimous agreement remains an utopian 
prospect, when power is considered, Hobbes order may not be, after all, the best of all 
worlds. 



10. Conclusion 

This preliminary research is set up in a larger context where we intend to study 
economic choice in interactive contexts and the role of social norms and conventions 
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in the shaping of the socio-economic order. We argued in a favour of a redefinition of 
the economic model of man: in most interactive contexts the moral dilemmas are 
inescapable; the human beings are neither self-centred purpose-seekers, nor pre- 
programmed rule-followers, they articulate action interests with higher order moral 
obligations. 

We interpreted the GA as a model of social learning and we used it to study a 
collective action problem. We are aware of the fact that the cognitive content (mind) 
of our agents is very poor and it is not our intention to argue that the evolutionary 
procedure used, and the interpretation we gave of it as a social process, is the only 
legitimate one. Other interpretations of the GA are possible, and other evolutionary 
algorithms might even lend themselves to richer and less unrealistic models. 
However, before moving on to more sophisticated models it may be worth exploring 
further the simpler ones, which, in spite of some strong assumptions, may have a 
more plausible behavioural content then the standard mathematic formalisms of the 
traditional rational choice models. 

The main contribution of these simulation experiments for the broader research 
program may be presented under the form of three tentative conclusions: (a) In large 
anonymous groups, collective action may not be sustained without an ‘external’ 
monitoring meta-agency; (b) but, if anonymity is relaxed, a distributed meta-agency 
may sustain collective action, provided that, within the group, exists a sufficient initial 
level of compliance with the shared rules; (c) given an uneven distribution of power, 
shared rules that are not beneficial to the group may persist. 
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Abstract. We present two models of hierarchical structured multi-agents, and 
we describe how to obtain a modal knowledge base from distributed sources. We 
then propose a computationally oriented revision procedure for modal knowledge 
bases. This procedure is based on a labelled tableaux calculi supplemented with a 
formalism to record the dependencies of the formulae. The dependencies are then 
used to reconstruct the minimal inconsistent sets, and the sub-formulae responsi- 
ble for the inconsistencies are revised according to well-defined chains of modal 
functions. 



1 Introduction 

Individuals are able to build a model of the world, and so are institutions. In a common 
(even if a little idealized) version of this process of model building, it is assumed that 
a knowledge base is usually built using pieces of information collected from “outside”. 
The knowledge is only partial, and the process of acquiring data is unending. As data 
are acquired, they are also incorporated in theories: a knowledge base is not just a 
collection of facts, but also a system of rules connecting them. It is assumed that a set of 
rules is present from the beginning; new data are used for improving and refining it. It 
may happen that new (reliable) data are not fully compatible with the knowledge base, 
so we have to modify (revise) it to accommodate them. Moreover, data are acquired 
from several different sources, each one probing a sector of an environment (physical 
or conceptual). So, the process of building a knowledge base deals with distributed and 
partial data. As information collected from various sources may be contradictory (from 
one source we getp and from another source we get -ip), using a modal language seems 
natural. In this way the incompatible pieces of information p and -ip are represented as 
Op and O-ip, meaning that both are possible. 

In section[2]we present two hierarchical models of agents, then, in section[3]we show 
how to construct modal knowledge bases arising from the above models. In sections |4] 
and|5]we describe a revision procedure for modal knowledge bases, and in section ISTI 
we propose a tableau formalism to be used in the process of revision. 

2 Sensors, Agents, Supervisors 

The Basic Model (SA-model) The basic model, or SA-model, includes two compo- 
nents: a set of sensors and one agent. Sensors perform measurements and send the results 
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to the agent. For the sake of simplicity we assume that the sensors share the same langu- 
age, consisting of a set of properties P. When a sensor s performs a measurement and 
finds that a property p holds in its location, then we say that p holds at s. If the property 
p does not hold, then we say that -ip holds. 

The agent, on the other hand, has a richer language than the sensors. First, its language 
includes modal operators allowing the coordinatation of information sent by sensors. 
When the sensor Si informs the agent that p holds at Si then a fact K^p is added to the 
agent’s theory, meaning that the agent has received the piece of information p from the 
sensor Si. Moreover the language is supplemented with the modal operator □ and its 
dual O, conceived of as knowledge operators. The meaning of □ and O is 

□p iff all sensors read p Op iff there is at least one sensor which reads p . 

Second the agent is equipped with a number of rules about the world. This means that 
the agent, contrary to the sensors, has a theory of the world, which may include rules 
expressed in a modal language, using a suitable system of modal logic. The theory the 
agent starts with might not be the right one; it might be the case that the information sent 
by the sensors is not compatible with the agent’s theory. In this case the theory must be 
changed in order to accommodate the new facts. 

Different kinds of theory change have been developed, according to different basic 
assumptions about the status of old and new information. The simplest case is that in 
which the sensors are reliable and the world is fixed. In this case new data are accepted and 
will be never contradicted by further evidence. If the agent knowledge base is partitioned 
into defeasible and non-defeasible formulae, then the only defeasible formulae are the 
rules the Agent started with (this is the standard case of the AGM approach). Another 
case is that in which the sensors are reliable but the world can change. In this case 
(whose study was initiated by CUl) all formulae are defeasible except the last piece 
of information. When the sensors are not reliable some kind of preferential model is 
necessary. In these models, the (possibly contradictory) evidences coming from different 
sensors are weighted against a (pre-loaded or incrementally built) factor of confidence 
of the different sensors. In this way, evidence may also be rejected. We shall not pursue 
this line here, however. 



A Hierarchical Model (SAS-model) The basic model described in the previous section 
treats the sensors as slaves: they do not have a theory about the world, do not perform 
inferences, do not have to revise theories. The Agent is the only intelligent component of 
the system. However, a more complex model can be built assuming that a set of agents 
are connected to a supervisor. We may think of autonomous robots, having a certain 
degree of intelligence, which are coordinated by a central computer. Each robot, in turn, 
unleashes a number of slave sensors to gather data about the world. The supervisor has 
its own theory about the world, and a subset of the theory is shared by all the agents. It 
is a higher level theory, which is then specialized by lower level theories at the Agent 
level. In this case, theory revision (or update) can occur at two levels and information 
might flow in both directions. 

Different patterns of communication can be envisioned. The pattern we are con- 
cerned with is studied to minimize the interactions between levels: at each stage, only 
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two (possibly non-adjacent) levels are involved. The difference between communica- 
tive events lies in the different status deserved by data sent by the sensors and by the 
supervisor. 

Van Linder et al. (H discussing a homogeneous environment, argue in favor of 
an individualistic concept: in their wording, seeing is stronger than hearing. In our 
structured, hierarchical environment, we would better explore the idea of a priority 
accorded to the communications downloaded from the central computer. 

Structuring a complex environment in hierarchical layers is a common strategy in 
order to keep communication complexity low. Accordingly, the burden of revision (and 
maybe decision) can be distributed across the layers. 

In the diagram beside we present a simple framework made of a 
supervisor S, which rules over two agents oi and 02 , and each agent 
ai(i = 1,2) has two sensors Sii and Si 2 - The arrows determine the 
direction in which data flow. 

Agents and sensors behave as in the SA-model as long as they 
can, that is, they get through a cycle of events of the type: 

1 a Sensors read data; 

2a Sensors send data to their agent(s); 

3a Agents get data; 

4a Agents revise their own theory against data sent by their sensor 
theory. 

If an agent cannot restore overall consistency, it means that the supervisor theory itself 
is not consistent with the facts the agent has access to. In this case, the agent asks for 
the intervention of the supervisor. The supervisor collects the data from all agents and 
revises its own theory in order to restore consistency: 

lb Supervisor gets data from the agents; 

2b Supervisor revises its belief system; 

3b Supervisor broadcasts the revised theory to the agents. 

However, it is possible that the theory is inconsistent for the agent but not for the super- 
visor; so we split the supervisor’s theory into two parts: the first consists of global rules, 
i.e., rules that are passed to all the agents; the second contains local rules, i.e., rules that 
hold only under particular circumstances, and are transmitted only to given agents. In 
an environment with n agents this is implemented by n -f 1 sets: the set of the rules and 
a set of exceptions for each agent. A set of exceptions contains the rules that do not hold 
for a given agent. The rules passed to an agent are all the rules minus the exceptions 
for that agent. Once the supervisor determines inconsistencies occurring at the agent’s 
level but not globally, it adds the culprit rules to the respective sets of exceptions. Note 
that the revision cannot change a consistent overall theory (agent-i-supervisor) into an 
inconsistent one; at most, it can change an inconsistent overall theory into a consistent 
one. 

The last step is on the agents’ side. They might simply accept the new theory, but this 
could result in too weak overall theories, in case that, while the now revised supervised 
theory was in effect, the agents had revised their own theories beyond need. So, they 
restore their original theory, and then revise it against sensors’ data and supervisor’s 
theory. 
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Ic Agents get the revised theory from supervisor; 

2c Agents restore their original theories; 

3c Agents revise their theory against data sent hy their sensors and the Supervisor theory. 

This implementation of this scheme depends on how beliefs are represented and revised 
by actors. 

In order to make communication of rules (and rule modifications) feasible: 

a) the knowledge state of an actor is represented by means of a finite set of rules; 

b) the revision/update operation is computationally well-defined, and results in a finite 
set of rules; 

c) the revision (or update) operations should be as local as possible, that is, the change 
affects only a limited number of rules. 

Condition c) expresses another aspect of minimality, i.e., a minimal change of theory 
form, which complements the usual views about a minimal change of theory content. 

When an agent passes its revised rules to the supervisor, the latter has to treat the 
modal operators as local to the agent, that is, it has to interpret them in the same way 
the agent interpreted the sensor’s data. Agents at different levels of the hierarchy share 
the same logical structure, so that it is easy to extend the hierarchy through other levels. 
However, the knowledge states of actors at different levels differ. The fact that central 
rules are down-loaded to lower-level actors need not mean that all central rales are down- 
loaded. We simply say that relevant rales are exchanged. The simplest relevance criterion 
is syntactic: only a subset of the central rales are relevant to each lower-level actor. In 
turn, the choice might reflect a lexical criterion: each lower-level agent is aware of only 
a subset of the features of the world, and so can access only a subset of the vocabulary 
(atomic constants and predicate symbols). This just means that the supervisor has to 
account for the different fields of action of the agents. 



3 Modalities for a Multi-agent System 

Two main reasons suggest the use of modal logic for the multi-agents framework we 
propose: 1) the epistemic interpretation of the notions involved; 2) data are gathered 
from different sources, that can be conceived of as possible worlds. In the models two 
kinds of modalities are involved: for a sensor and □ (and its dual O) for the 

agents and the supervisor. 

In the SA-model we have an agent receiving data from n sensors, while in the SAS- 
model we have a unique supervisor supervising n agents. As in the SA-model each agent 
Oi,(l < * < n)hasim sensors. Both models share the same language, the same treatment 
of data, and the same revision methodology, but they differ for the representation of the 
knowledge bases involved. 

Since the sensors are reliable and the world is locally consistent (no sensor can read 
p and -ip at once) we obtain that the K^s modalities are normal, which means they 
satisfy the following axiom 



Ky(A — >■ B) — >■ (KijA — >■ K.ijB) . 



( 1 ) 
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The axioms connecting with □ and <> are: 






A 



K,jA I aA 



i=l,j=ii 






V 



K,jA I ^ 



i=l J=il 



( 2 ) 



Each agent has at least a sensor able to perform measurements, and the supervisor 
supervises at least an agent, therefore from the above axioms we obtain DA OA. 
Due to the epistemic interpretation of the modal operators □ and O, it is natural to 
assume the common axioms for positive introspection {DA nnA), and negative 
introspection (0^1 — )> DOA). It is possible that agents or the supervisor are equipped 
with on-board sensors; in this case we add the axiom DA — A, so the resulting modal 
logics, in which we express the knowledge bases of the agents and the supervisor, are 
the well known systems D45 and S5. 

In the S A-model we have to deal only with the knowledge base of the agent consisting 
of the pair Ba = TV) where T is the set of facts collected by the sensors, and TL 
is the set of rules. In the SAS-model we have a knowledge base for the supervisor, 
and a knowledge base for each agent. The knowledge base Bs of the supervisor is 
Bs = Qi iSai , • ■ ■ , ^a„) where T is the set of facts collected by the sensors; Q is the 
set of global rules; and each is a subset of Q containing the global rules that do not 
hold for the agent . The knowledge base Ba^ of an agent ai is described in terms of the 
triple: Ba^ — ■, , Qai) where is the set of facts collected by the sensors; 

is the set of agent’s local internal rules; and Qa^ = G — Eat is the set of down-loaded 
rules, i.e., the rules passed to the agent by the supervisor. 

We still have to see how data are passed to the agents and the supervisor; we identify 
a sensor with the set of information it has collected. So, if p G then p G Tat > and 

then .7^ = 

It is worth noting that all the pieces of information in the set of the data gathered by 
an agent or the supervisor are propositional and are transformed into modal form once 
passed to the supervisor. We argue that data in conditional and disjunctive form (e.g., 
p — >■ q, p V (?) are meaningless while we accept negative data, for example -ip. Let us 
suppose that our sensors are cameras, whose optical field consists of n pixels. We have 
a wall where we have drawn lines of various lengths. We move our camera in front of 
each line, then p stands for “the length of the line is n”, where n is a given number of 
pixels, -ip means that the length of the line actually in front of the camera is not of n 
pixels. In general a negative data corresponds to a measurement beyond given bounds. 
Since we accept only measurements in conjunctive and negative form we can reduce the 
elements of each T into literals. 



4 Revision of Finite Modal Knowledge Bases 

We saw in section[2|that the revision mechanism is central to the behavior of the system. 
Different models have been described, depending on: a) which data are defeasible and 
which are assumed as certain; b) which modal characterization is given to the underlying 
theory. 

The revision engine that will be described in this paragraph can perform modal 
revisions according to the constraints of finiteness and implementing different schemes of 
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defeasibility. It is a computationally oriented procedure for revision of finite knowledge 
bases. 

The classical AGM model ([Tl|, ifTSll ) has been recently criticized (see 1141 . f9ll . (T)) for 
being too liberal about constraints of finiteness and computation. The axiom of recovery, 
saying that the result of adding a piece of knowledge to the result of contracting it from 
a knowledge base returns the original base, has been criticized as being unnecessarily 
demanding (see [Bj, mi for a thorough analysis). 

Our procedure is computationally oriented and does not satisfy the recovery axiom 
and yields a minimal change in the sense that the original rules of the knowledge base 
are retained, albeit in a modified form, as long as possible. The procedure as described 
here does not use information about entrenchment. It can yield non-trivial results also 
when no entrenchment information is available, contrary to what happens in the classical 
AGM approach, and it can be easily extended to manage them, when available. 

We use revision as the primitive operator, although it is often argued that contraction 
should be used as a primitive operator (see [E] ; see also m for a different view); however, 
it seems rather unnatural to suppose that our agents should change their minds about 
properties of the world were it not for the necessity of incorporating in their knowledge 
base a new fact. Exploration by means of sensors always yields new data to be added 
in some way into the existing corpus; it never offers a negative view. An exception 
should be made for the case in which a sensor tells its agent that a measurement already 
done is unreliable. But this amounts to going back to the rule set existing before that 
measurement, and revise it using all subsequent measurement except the one that was 
declared unreliable. This is not, however, the primitive case, but a rather complex and 
sophisticated one, which deserves the role of a derived operation. 

There is one more reason for not choosing contraction as a primitive operation, and 
it is connected with the choice of dealing with modal revision. The standard account for 
contraction goes as follows. Let us assume that base B implies proposition a. Then, for 
some reason, a has to be abandoned. So, we have to contract a from B. The reason for 
relinquishing a, however, is not that -^a is found to hold, otherwise we should revise B by 
-■a. Rather, we feel unsure as to which of a and ~^a should be maintained. For instance, 
we perform repeated measurements, and sometimes we get a and sometimes -la. As we 
feel that both a and -la might be the case, we have no choice other than to contract a. 
This fits well with a modal approach, in which the only primitive operation is revision. 
Indeed, the revision/contraction contrast may be described in terms of modalities. The 
situation described can be restated in the following terms; the base B implies Da; then, 
we perform repeated measurements, and sometimes we get a and sometimes -la; this 
means that our theory must be revised by Oa A O-ia, properly expressing the fact that 
we feel that both a and -la might be the case. 

5 The Revision Procedure 

As noted before, the revision procedure starts when a formula is added to a set of 
formulae, and some contradiction would arise if no modification is made. 

Formulae are divided into two classes: defeasible and non-defeasible. For simplicity 
we shall call them, respectively, rules and facts, even if a non-defeasible formula might 
be, in fact, a rule (e.g., a down-loaded rule), and a fact could be defeasible (e.g., if 
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we adopt the update point of view, in which only the last fact is non-defeasible). How 
this partition is made depends on the model of exchange and communication between 
the levels of the hierarchy. Both rules and facts are expressed in a modal propositional 
language. We set no restriction on the form of rules and facts. For the sake of simplicity, 
in the examples we shall assume that rules have the form A ^ B, where A and B 
are arbitrary modal formulae. If this is not the case, a simple manipulation will do (for 
instance, replacing an arbitrary formula C with T — ?> C, where T is the atomic constant 
for true). 

The procedure is based on the following steps; 

1 . find all minimal inconsistent subsets of sub-formulae of the original set of formulae; 

2. for any subset, weaken all the rules modally and propositionally; 

3. reconstruct the set of formulae starting from the revised subsets. 

The procedure yields a finite set; the process of finding the inconsistent subsets relies on 
the specific modal logic, and is computationally feasible thanks to the properties KEM, 
the method we shall use to determine the minimal inconsistent sets (see section ISTTI) . 

The first step is rather standard in principle. Using KEM , however, makes it pos- 
sible to construct all minimal inconsistent subsets of formulae at the same time the 
inconsistency is proved, resulting in a much better efficiency. 

The second step, on the contrary, is not so common. When revision of finite bases is 
performed, it is rather standard to restore consistency by deleting one or more formulae 
from each inconsistent subset (that is, contracting the base) and then adding the new fact. 
Of course, deleting all of them results in a far from minimal mutilation of the knowledge 
base; on the other side, if no extra-logical information is supplied as to which formula 
has to be deleted, deleting all of them is the only alternative to non-determinism. The 
choice is between a revision based on the so-called safe contraction (too demanding) and 
a non-deterministic revision. Our proposed algorithm keeps the flavour of a safe revision, 
in that it operates on all the formulae in the subsets, but does not delete them: we simply 
modify them in order to recover consistency. We want to avoid non-determinism, which 
might result in a disaster for the agents, at the same time retaining as much information 
is possible. 

The third step is, again, rather standard; minimal inconsistent sets are deleted and 
replaced by the modified sets. 

The resulting formulae are still divided into rules and facts, so the base can be used as 
a base for further revisions. However, it should be noted that this division into facts and 
rules is not essential to the process. If a fact is the only new piece of code to be added, and 
all other pieces are on the same par, the process yields again a reasonable result, contrary 
to what happens in the AGM frame, where we end up with just the consequences of the 
new fact. 

5.1 Modal Tableaux as Contradiction Finders 

A well known family of theorem proving methods is based on tableaux. A tableau is a 
tree whose nodes are labelled by formulae related to the formula to be proved. Tableaux- 
based methods aim to prove a formula by showing that there are no counter-examples 
to it, i.e., by showing that any assignment of truth values to the variables makes the 
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negation of the formula false. While tableaux are usually employed as theorem provers, 
they are, literally, contradiction finders. In fact, in the development of a tableau we try 
to show that the formula which labels the node under scrutiny cannot be satisfied by 
any assignment of truth values. If the root of the tree is labelled not by a formula but 
by a set of formulae, showing that all subtrees are closed we prove just the fact that the 
set is inconsistent. In this case, we may employ the information gathered in the process 
of finding the inconsistency to restore the consistency of the set of formulae through 
modification of some of the formulae involved. 

We shall use KEM which offers considerable advantages in terms of performance 
and flexibility in adapting to different modal systems (see (JJ). In order to use KEM 
in the process of identifying the sources of contradiction, we augment its formalism by 
adding a mechanism able to record which subset of sub-formulae of the original set of 
formulae is being used in the development of the branch, and so the subset involved in 
the contradiction which closes the branch. 

We start from a brief description of KEM (for a detailed exposition see Q). KEM 
is a labelled tableaux system based on a mixture of natural deduction and tableaux 
rules which uses labels to simulate the accessibility relation, and a unification algorithm 
to determine whether two labels denote the same world. It can be also used to check 
the consistency of a set of formulae, and information extracted from the tree helps the 
solving of not immediate contradictions; elsewhere 12] a preferences strategy connected 
to KEM has been adopted for the same problem. KEM uses two kinds of atomic labels: a 
set of constant world symbols, d>c = { , W 2 , . . . } and a set of variable world symbols, 
<Pv = {Wi, W 2 , . . . } that might be combined into path labels. A path is a label with 
the following form {i, i’), where i is an atomic label and i' is either a path or a constant. 
Given a label i = {k, k') we shall use h{i) = kto denote the head of i, and b{i) = k' to 
denote the body of i, where such notions are possibly applied recursively. l{i), and s"(i) 
denote respectively the length of i (the number of world symbols it contains), and the 
sub-label (segment) of length n counting from right to left. As an intuitive explanation, 
we may think of a label i S <k>c denoting a world (a given one), and a label i G <Py 
as denoting a set of worlds (any world) in some Kripke model. A label i = (k' , k) may 
be viewed as representing a path from fc to a (set of) world(s) k' accessible from k (or, 
equivalently, the world(s) denoted by k). 

Labels are manipulated in a way closely related to the accessibility relation of the 
logic we are concerned with. To this end it is possible to define logic dependent label 
unifications ol, which will be used in the course of KEM proofs. We start by providing 
a substitution p : 3 i-A 3 thus defined: 

{ i i G <d>c 

j GQ iG<Pv 

{p{h(i)),p{b{i))) l{i) > 1 

From p we define the unification a from which it is possible to define the appropriate 
unifications for a wide range of modal and epistemic logics (see O), as follows: 



Vi, j, k G^,{i,j)a = ki&3p : p{i) = p{j) and p{i) = k 
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However, in this paper, we present only the cr^ -unifications for the logics we are con- 
cerned with, namely L = Z945, 55. 



(z, k)ao45 



{{h{i),h{k))a,{s'^{i),s'^{k))a) l{i),l{k) > 1, 

(z, k)a l{i),l{k) = 1 



(3) 



ihk)<TS5 



{{h{i),h{k))a, (s^(z), s^(fc))cr) (h(z), h{k))a ^ (s^(z), s^(fc)) 
{h{i),h{k))a otherwise 



Example 1. It can be seen that ((IHi, zui), (H 2 , zci))cri:i 45 , but the paths {W\,wi) and 
W\ do not cT£) 45 -unify; this corresponds to the fact that OA OA holds in D45, but 
OA ^ A does not. 

In defining the inference rules of KEM we shall use labelled signed formulae, where 
a labelled signed formula (L5-formula) is an expression of the form X, i where X is 
a signed formula and z is a world label. Given a modal formula A, z, the L5-formulae 
T A, i and FA, i represent the assertion that A is true or, respectively, false at the world(s) 
denoted by i. T A and FA are conjugate formulae. Given a signed formula X, by X'^ 
we mean the conjugate of X. 

In the following table signed formulae are classified according to Smullyan-Fitting 
unifying notation 0. 



a 


Ql 


Cx.2 


13 


Pi 


P 2 


taab 


TA 


TB 


faab 


FA 


FB 


FAy B 


FA 


FB 


TAV B 


TA 


TB 


FA^ B 


TA 


FB 


TA^ B 


FA 


TB 



V 


1^0 


7T 


zro 


TaA 


TA 


TOA 


TA 


FOA 


FA 


FOA 


FA 



We shall write \a \ , af\ and [/3i , ( 32 ] to denote the two components of a a-formula 
(respectively, of a /3-formula). 

KEM builds a tree whose root is labelled with the set of signed formulae {TAi, , 
TAn} and the label zui, corresponding to the assertion that all propositions Hi, ... , 
in the original rule set are true in the actual world. Branches are built by means of 
inference rules which derive new signed formulae which hold in specific sets of worlds. 
In doing so, they build new signed formulae and new world label strings. While the 
rules for deriving new signed formulae depend only on propositional logic, the rules 
for deriving world label strings depend on the specific modal logic at hand. A branch 
is closed when it contains a signed formula X and its conjugate X'^ which hold in the 
same world. If all branches are closed, the tree is closed, and the root is contradictory. 

A characteristic of KEM is the analyticity property, that is, all signed formulae are 
sub-formulae of one of the original formulae. This is accomplished by limiting the 
cut-rule to one component of a /3 formula. 

In order to gather information about the inconsistencies of the set of rules, we enrich 
KEM with three new sets of labels. The first one records the (component of the) original 
signed formula from which the signed formula labelling the node derives. The second 
one records the set of (components of the) original signed formulae used in deriving 
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the signed formula labelling the node. The third one records the ad hoc assumptions 
made by applications of the cut rule. These three labels will be denoted by I, si and c, 
respectively. 

A difference from the usual procedure of KEM is that even if all the branches are 
closed, we continue deriving new nodes until all the original formulae are used in all 
branches. When the procedure stops, we have a number of nodes where the branches 
close. The si label, together with the c label, identifies a minimal inconsistent set of 
rules. The rules in si are to be revised in order to restore consistency. 



a -rule 



/3-rule 



v-m\e 



TT-rule 



_L-rule 



[ai,a 2 ] i I si c 

ai i l.ai si — {/} U {l.ai\ c 

a 2 i Im 2 si — {/} U {l.a 2 } c 



[Pi j P2] 

Puin = 1 , 2 ) 

Ps—n 



i 

3 



I 

V 

^•Ps—n 



sl C 

sT ^ 

sl LI sl' — {1} LI {l.Pn} cLIc' 



V i 

l/Q (*',*) 

7T i 

7To ip! p) 



I sl 

1 d 




c 



S <l^v new 



I sl 

1 d 




c 



S <1>C new 



X i I sl c 

j I' sT ^ 

_L {i,j)aL nil (sZ U {/} — c) U (sT U {/'} — c') cUc' 



cutnile i I sl c 

Pi i l.Pi 0 cU{l.Pi} p'p i l.Pi 0 cU{l.pi} 
(similarly when (32 is used) 

Soundness and completeness for the above calculus are given in [2ll- 

The minimal contradictory sets may contain labels of the form l.s where I is the 
number of a formula and s is a string built by tokens belonging to the set {ai, a 2 , /3i, 
/32}, with dots between. This means that the subformula of I identified by s is responsible 
for the contradiction. The other components of the formula I can be retained; only l.s 
has to be weakened. So the structure of the labels that tells us which components have 
to be weakened. 



5.2 How to Weaken the Formulae Responsible for Contradiction 

Given a set of rules and facts which yields a contradiction, we may restore consistency by 
weakening the rules, that is, the defeasible ones. Rules may be weakened both modally 
and propositionally. Some of the elements of a minimal contradictory set might be facts, 
that is, non-defeasible formulae. If all elements are facts, no revision can be made; there 
is no way of restoring consistency. If, on the contrary, at least some of the formulae are 
rules, we can weaken them and restore consistency. 

Let {Afei — the set of rules in the fc-th minimal contradictory set, and {Fj} 
be the set of facts in the fc-th minimal contradictory set. The propositional weakening of 
Aki — >■ Bki is the rule (\/^ -^Fj A Ak) — >■ Bki-As all facts Fj hold, the antecedent of the 
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weakened rule is false, and the inference of is blocked. The exact form depends on 
the sub-formulae involved. In turn, the exact form of the modal weakening of ai [3i 
depends on the modal system we are working in. 

In general, the modal weakening of Ai Bi includes an antecedent weakening 
and a consequent weakening, which are, in a sense, dual of each other. In order to fix 
ideas, let’s consider the antecedent weakening. We want to substitute Aj-i — ?> Bj^i with 
o'iiAki) Bki, where ai{Aki) is a modal expression such that <Ji^{Aki) — >■ A^i but the 
converse does not hold; ai{Aki) Bki is weaker than Aki — )> Bki, in the sense that it 
is more difficult to find a counter-example to (Ji{Aki) — ?> Bki than it is for Aki Bki', 
this may be expressed by saying that the set of models for ai(Aki) Bki is greater than 
the set of models for Aki Bki- However, if we want to perform a revision according 
with some criteria of minimality, we need to put some constraints on the choice of a^. 
It seems to us that the following constraints are reasonable: 

a) ai(Aki) is a modal function of Aki, that is, can be built without any information 
other than Aki 

b) ai^{Aki) is a positive modal function, that is, no negation except those possibly 
contained in Aki should be used. 

c) We should use the weakest modal expression among those satisfying a) and b), in 
order to obtain some kind of minimal revision 

There are two reasons for using only positive modal functions. First, in using arbitrary 
modal functions we might turn a true formula into a false one. Assume that we revise 
A ^ Bhy A A O-iA B. Now, if A happens to be an identically true formula, it is true 
also in all the accessible worlds, and A A O-'A is identically false. While the original 
rule was equivalent to B, the new one is simply irrelevant. Second, by limiting ourselves 
to positive modal functions we impose constraints on the set of worlds in which the 
antecedent must hold in order to derive the consequent, while using arbitrary functions 
we are no more able to give such an interpretation to our revised rule. 

Condition c) may be difficult to satisfy, because it may be difficult to determine a uni- 
que minimal modal expression. We already said that we want to avoid non-determinism 
and excessive mutilation of the knowledge base. Using these guidelines, we satisfy 
condition c) by means of the following construction (given a modal logic): 

c.l) build the poset [D{A),^], where D{A) is the set of the positive modal functions 
of A; 

c.2) intersect all the chains (linear ordered subsets) of [Z)(A),— :►] that include A itself; 
this will be a chain itself; 

c.3) cr^(A) is the greatest element of the chain such that cr^(A) — A; it is the element 
nearest to A “from below”. 

Whether step c.3) can be performed or not depends, of course, on the system of logic 
we work in. However, it can be shown that <t^(A) exists for some common logics. For 
example for the logics D45 and S5 described in section[^ it possible to define exactly the 
result of the weakening process. This can be expressed using a chain of modal functions 
of an arbitrary formula (p. It can be proved that the chains of modal functions in Z945 
and S5 are 

(j) A Op p A Op p ^ p\/ Op py Op Op p ^ Op 

Chain of modal functions in G45 Chain of modal functions in S5 
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Steps C.2) and c.3), respectively, guarantee safeness (intersecting all the chains is similar 
to weakening all the formulae) and minimality. 

Similarly the weakening of the consequent results by substituting with a^{Bki), 
a modal expression built using only Bki and no extra negations such that a^{Bki) 

but the converse does not hold. 

As a simple example, let us suppose that the rule: A ^ B is responsible for a 
contradiction; let us also assume that the only distinct modal affirmative functions of 
X in the language are Ux and <>x, and that there is a predicate C whose value is true. 
Then, the original rule can be weakened by substituting it with three rules: 

UA^B A^OB AA^C^B 

Even if the three rules allow a number of non-trivial derivations, they no longer allow 
the derivation of B. 

Some justification is due for using modal expressions with no extra negations. This 
amounts to using positive modal functions as weakening sensors. There are two reasons 
for that. In revising the antecedent, we use the weakest suitable modal expression (the 
strongest for revising the consequent). This might be insufficient to block the inference 
responsible for the contradiction. If an inconsistency is still found, we must use a “stron- 
ger” weakening, if available. If not, we have no more options, other than to discard the 
rule. In S'5 we have no second options, due to the fact that = □«. In 7945, however, 
more possibilities are at hand. If only one of the components of a /3-formula is included 
in the contradictory set, then the other can be safely weakened the standard way, while 
the component actually implied could need a stricter weakening. 

It must be noted that we add /acts, but only rules are revised. This means that if an 
inconsistent set includes only facts, no revision is possible: inconsistent facts cannot be 
reconciled. 

5.3 How to Reconstruct the Set of Formulae Starting from the Revised Subsets 

In order to reconstruct a new set of rules from the revised rules in the contradictory sets, 
we have first of all to make sure to include all the sub-rules not to be revised. This can 
be easily done by adding to the original set the complement of the rules in one of the 
contradictory sets. Some examples will clarify the matter. Let Ai V A 2 — 73i A B 2 be 
the original rule /, and let L/3i .«2 be in the contradictory set. This means that the sub-rule 
actually implied is A 2 ^ Bi A B 2 We find it by following the structure of the label 
l.Pi.a 2 . break the rule in the first /3-component, taking the second a-component. If the 
labels were L/3i.a2 and(./32.ai then the rule implied would be A 2 — >■ 73i . In our case, the 
other sub-rule Ai ^ Bi A B 2 may be safely added to the original set of rules. Then we 
have to add the revised rules. In our example, we should add A 2 A \/j -•Fj -A B 1 AB 2 as 
the propositional weakening, and the rule 0 A 2 B\ A B 2 as the modal weakening. As 
the third step, we must delete from the set of rules all sub-rules in the contradictory sets 
and all the parent rules. The necessity of also deleting the sub-rules in the contradictory 
sets stems from the possibility of reintroducing a rule piecewise, one sub-rule from each 
of the sets. Then we have to check for consistency of the modified rules. It is enough 
to check the consistency of the rules resulting from the revision on the rules in the 
contradictory sets plus the ad-hoc assumptions (the set labelled by c in the tableaux). 
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6 Examples 



Example 2 In this example, we employ the most simple but not 
trivial structure, i.e., a supervisor S, an agent a and two sensors 
Si and S 2 arranged as depicted beside, we show how the revision 
procedure works, assuming 55 as the logic for the agent and the 
supervisor. 

In this example we deal with the knowledge bases, i.e., Ba = 

{Ta, Ca, Qa) for the agent a, and Bs = {T, Q, £a) 

We start with Ca = {Dp — >■ □<?}, Q = {Oq — >■ Or}, and £a 
Qa = Q. The sensors gather the following information: si = {p} and S 2 = {p, 
which are passed to the agent in the following form: {Kip, K 2 P, K 2 ~'g} that means 
IFa = {Op, The agent checks the consistency of its own knowledge base by the 

means of the KEM-tree starting with the union of the elements, namely Ca^ Ga^ ^a- 




= 0, therefore 



Ga={ 


' 1 TOq Dr 


Wl 1 


1 


0 


Ca ={ 


’ 2 Tap -s- Oq 


wi 2 


2 


0 


Ta =\ 


'( 3 Tap 
[ 4 TO-^q 


Wl 3 
Wl 4 


3 

4 


0 

0 




5 Taq 


Wl 2.132 


2.Pi,3 


0 




6 Fq 


(W2,Wl) 4 


4 


0 




7Tq 


{Wi,wi) 2.(32 


2./3i,3 


0 




8 _L 


{W2, Wl) - 


2./3i, 2 . 52 , 3,4 


0 



Notice that we have deleted all the inessential steps. In fact, it is immediate to see that no 
other contradictions can be derived from the above tree. The set of formulae responsible 
for the contradiction is {2./3i, 2./?2, 3, 4}; however only 2.f3i and 2./?2 should be revised 
in so far as 3 and 4 are facts. We apply the revision function obtaining a^{Up) Oq 
and Up — >• ai(Oq) The first fails. Up is already at the top of the chain; whereas the 
second succeeds, being q = CT^(nq). Therefore the revised set of internal rules consists 
of = { Up q} At this point the agent has restored consistency and the sensors may 
collect new pieces of information. Let us assume that the new data are si = (p, -r} 
and S2 = Ip, “'(?} The new set of facts turns out to be .7^' = l^p, O-'q, O-r} Again, 
the agent runs the KEM tree for its knowledge base. 



Ga={l TOq ar 


Wl 1 


1 


0 


C'a={2Tap^q 


Wl 2 


2 


0 


(3 Tap 


Wl 3 


3 


0 


T'a =UFaq 


Wl 4 


4 


0 


Far 


Wl 5 


5 


0 


6 Tq 


Wl 2.(32 


2.(3i,3 


0 


7 FOq 


Wl 1.5i 


1 . 52,5 


0 


8 Fq 


iWi,wi) l.(3i 


1 . 52,5 


0 


9 _L 


Wl — 


1 . 51 , 2 , 251 , 2 , 3,5 


0 



The contradiction arises from |l./3i, 1-^2, 2./3i, 2./?2, 3, 5}, but only 2./3i, 2.^2 have to 
be revised by the agent: 3 and 5 are facts and 1 is a global rule that can be revised 
only by the supervisor. The revision function leads to a^{Up) — >• q. Up <Ji{q), and 
Up /\ Ur ^ q The first is not applicable for the same reason of the previous case. 
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the second produces Op Oq and the third is the propositional weakening of 2. The 
resulting state 

•^a = C" = {Op ^ Oq,Op AOr ^ q} Qa = [Oq ^ ^r} 

is inconsistent. 



1 TOq Or 


Wl 1 


1 


0 


2 Tap Oq 


wi 2 


2 


0 


3 Tap A Or -s- q 


Wl 3 


3 


0 


A Tap 


Wl 4 


4 


0 


5 Faq 


Wl 5 


5 


0 


6 Far 


Wl 6 


6 


0 


'7 TOq 


Wl 2.P2 


2./3i,4 


0 


8 TDr 


Wl l./3i 


1./?2,2./3i,4 


0 


9 _L 


Wl — 


1./?i,2,2./3i,4,6 


0 



It is easy to see that the inconsistency arises from the conjunction of local and global rules, 
namely from Op — >• Oq and Oq -A Or, therefore the agent notifies the inconsistency 
to the supervisor. However the supervisor recognizes a consistent state, being T = Ta 
and = f/a- At this point the agent has to revise again the culprit rule (i.e.. Op -a Oq)\ 
unfortunately the only way to revise it is to delete it, no more modal weakenings are 
possible. The resulting set of local rules C” consists of Op AOr ^ q. 

Example 3 In this example we assume a slightly more complex struc- 
ture to illustrate the use of exceptions. In this framework we have 
a supervisor S, two agents a\ and 02 , and each agents has a single 
sensor. According to our model the knowledge bases are: Bs = 

{T,Q,£a^Ea^),Ba^ = {T a^) , wABa^ = {Ta^, Ca^,Qa^) 
where Ca^ = Ca^ = Q = {Op -a Og}. Therefore 

Q — Gai = Ga 2 - The sensors collect the following data: si = {p,~'q} S 2 = {p,q}, 
then the sets of local facts are: IFai = {^P, and lFa 2 = ^ 9 } It is im- 

mediate to see that and Gai are inconsistent, and the contradiction is due to the 
global rule Op Oq. The supervisor gathers the data from all the sensors obtai- 
ning {Kip, K 2 P, Ki-ig, K 2 g} which implies T = {Op,0~'q,Oq}. However, U G 
is consistent, so the supervisor adds Op -a Oq to the exceptions of ui {Ea^). The 
new knowledge bases are: Bs = {T ,G ,E'^^,Ea 2 ) and = {Ta^,Cai,Ga^), where 
Ea^ = {Op ^ Oq}, and, consequently = Q - Ea^ = 0. 

The procedure for dealing with exceptions can be viewed as a special kind of modal 
weakening. Due to the equivalence (T -A A) = A global rules can be conceived of as 
consequents of conditional rules whose antecedent is an always true formula. We then 
apply the revision function cr^(A) obtaining OA, which means that A holds somewhere. 
OA restores consistency, but is too weak for our purposes, we do not know where A 
holds. However the supervisor knows the agent where the exception does not hold, so 
instead of changing the formula, it adds it to the set of exceptions. 




7 Conclusion 

In this paper a model of theory revision based on a hierarchy of agents is explored. In 
order to coordinate data acquisition from different agents, a modal language is used. We 
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extend to the modal case ideas originally put forth for dealing with purely propositional 
knowledge. The problem of combining data from different sources is important per se, 
and a long standing priority for AI. Besides, the hierarchical models may be seen as first 
steps toward a fully distributed model, in which each agent builds and maintains a model 
of the other agents’ knowledge. 
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Abstract. Even a simple artificial agent will have to make decisions and 
consider their consequences. An agent can achieve this by evaluating situations 
along several dimensions. The BVG (Beliefs, Values and Goals) architecture is 
a means to build up such an evaluation machinery. The motivational part of the 
agent’s mind is captured hy the concept of value, able to capture the preferences 
of the agent, which cannot be reduced to either beliefs or goals, or a 
combination of the two. We sketch a calculus for agent decision, and look into 
the problem of feedbacking the consequences of actions into the agent’s 
decision mechanism, in order to adjust its behaviour for future occasions. 
Finally, we describe an experiment where these ideas are put to test and look 
into its results to draw conclusions about their general applicahility. 



1 Introduction 

Let us consider a setting in which autonomous agents are inserted into a shared 
environment and have to cope with scarcity of resources, complexity, dynamism and 
unpredictability. By agents we mean entities that will act either in behalf of someone 
(say, their user) or simply representing themselves. By autonomous we mean that 
those agents have motivations of their own, and can and will decide what to do based 
upon their own self interest. The simple fact that we expect our environment to have 
more than one agent, together with the notion of autonomy just given, ensures that our 
world is unpredictable and complex, and cannot be dealt with by computing in 
advance all possible courses of action, and so be prepared for every circumstance. Our 
agents will have to be smarter than that. 



1.1 Decisions and Rationality: Utility 

The problem we will be addressing is that of decision, that is, how can an agent 
choose from a given set of alternative candidate actions. What justifies this choice, 
what are the expected consequences of that choice, how do these expectations help 
illuminate the choice process (deliberation). 

In classical decision theory the concept of utility emerges as a possible answer to 
all these questions. Utility is a way of representing the preferences of the agent. Utility 
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theory says that every state of the world has a degree of usefulness, or utility, to an 
agent, and that the agent will prefer states with higher utility. Utility theory can be 
combined with probability theory to build the Principle of Maximum Expected Utility 
(PMEU) that states that an agent is rational if and only if it chooses the action that 
yields the higher expected utility, averaged over all the possible outcomes of the 
action [21], 

One of the constraints Russell and Norvig propose as central to utility theory is 
orderability. Orderability requires the agents to either prefer one of any two given 
states, or be indifferent between the two. It may well be the case that an agent does not 
relate at all two given states, and any attempt of comparing the two would be 
prejudicial, and proven wrong when new information arrives. Moreover, in a complex, 
dynamic environment, it is impossible to have strong beliefs about almost anything. 

Further criticism on utility-based decision is summarised in [16]. In particular, they 
criticise that all the agents have knowledge of the payoff matrix, and therefore full 
knowledge of the other agent’s preferences. Of course, assuming rationality is defined 
as utility maximising, this would amount to an agent being fully capable of predicting 
all the other agent’s choices. The world is necessarily closed, and becomes 
predictable, given enough computational power. 

The case here is not one of rationality versus irrationality. We are discussing the 
basic requirements an agent should fulfil to use utility theory for decision. And those 
requirements are too demanding on the capabilities of the agent. Surely if an agent 
fulfilled those, utility theory would be a rather strong candidate to inform a decision 
theory. The problem is that agents cannot raise themselves to be so perfect. As [24] 
notes, “a large body of evidence shows that human choices are not consistent and 
transitive as they would be if a utility function existed.” 

But if we want to challenge the concept of utility, and search for a substitute, let us 
look at the following classical example: the centipede game [11]. Imagine two players, 
A and B, are allowed to play a game. In the first move, A can choose to end the game 
with payoff zero for himself and zero for B. Alternatively, A can continue the game 
and let B play. B ends the game by either choosing payoff -1 for A and 3 for himself 
or 2 for both A and himself. 

The rational decision for A is to immediately end the game and receive zero, 
because the worst case alternative is to receive -1 in the next move. Anyway, if A 
fails to choose this, the rational decision for B would clearly be to receive 3, leaving A 
with -1. However, empirical experiments show that people tend to cooperate to reach 
the (2, 2) payoff [22]. 

Are all these people behaving irrationally? Or is it the notion of rationality that 
could be redefined? Do we have ‘higher values’ to take into consideration here? 
Which are they and how do they influence this decision? Let us quote from Simon 
again to ask: “What realistic measures of human profit, pleasure, happiness and 
satisfaction can serve in place of the discredited utility function?” 



1.2 BDI Agents 

The Belief-Desire-Intention (BDI) agent model [20, 7, 25, 10] is a different tentative 
at solving the decision-action problem. Basically, agents commit to the choices they 
have made (and that have turned desires into intentions), and do not change this 
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commitment until some significant change occurs (the intention is fulfilled or it 
becomes impossible to fulfil). [12], and also [19] look at what happens when a BDI 
agent is inserted into a world with varying rate of change. The commitment strategy is 
put to test and performs rather well, although subsequent additions prove to be in 
need, especially to avoid excessively fanatic agents. Other important issues that 
remain are the way to implement BDI agents, since modal logic is usually used to 
describe such systems, but proves difficult to translate into operative programs. 

In [25] a clear distinction is made between deliberation (the process of deciding 
what to do) and action selection (deciding how to do it). They are especially interested 
in the reconsideration of intentions, and so keep desires out of their agent architecture. 
It is still an issue to see where intentions come from. In [19], it is the deliberation 
mechanism that provides a serialisation of intentions, through the use of payoffs- and 
of utilities-based functions. Utilities are also used in [25] to define an optimal action 
selection function, but nothing is said about how to build it. 

In either case, the question of choice remains interesting and unexplored. In most 
BDI models, the deliberation process filters through the desires to provide intentions. 
But little is said about such a process, except the logical features the resulting 
intentions must possess: they must be consistent with the previously existing 
intentions, they must be achievable in accordance with the agent’s beliefs, and so on. 
It is too clean of a choice process, in the sense that all we have is technical reasons to 
choose or not. We lack real world, ‘dirty’ reasons for choice, the preferences of the 
agent. The truth is that in real world our choices aren’t that clean: we choose some 
things apparently for no reason, just because we want to give it a try; we pursue in 
activities although we’re not sure whether we will ever be successful; we are not 
rational. Or are we? This is the question we will be trying to answer in the rest of the 
paper. 



1.3 Overview of the Paper 

In the next section we provide an answer to this question by postulating values as the 
mental objects that allow choice to be made. We evaluate situations and possible 
courses of action against several dimensions, and use this multiple evaluation to 
finally choose the action to perform. Moreover, the outcome of that action is 
feedbacked into the agent in order to enhance future decisions. In section 3, we give 
form to these ideas, by providing an architecture for our agents. In section 4, we 
address a case study to show how such an agent operates in a real problem. Finally we 
draw some conclusions and look at issues for future work. 



2 Multiple Values 

The move away from utility-based decision we make here has been done previously in 
different manners. In [23], the attempt was to base social interaction in the theory of 
social dependency [8, 9]. Some agents’ abilities complement another agents’ needs. 
Keeping in mind that an agent’s behaviour is fundamentally determined by its own 
motivations, social interactions (such as goal adoption, social exchange, etc.) are 
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determined by the agent’s place in the network of social dependences involving it. So, 
the whole notion of sociality is based on this relational concept of dependence, 
allowing for such interesting forms of social situations as manipulation and 
exploitation, as well as exchange and cooperation. 

In the present work, the emphasis is on the individual mental components and 
mechanisms an agent must possess in order to give meaning to the expression ‘pursue 
of self interest.’ In a previous paper [1], we have proposed the use of multiple values 
as a new kind of mental objects that will have a fundamental role in decision-making, 
and so in defining the character of the agents. A value is a dimension against which a 
situation can be evaluated. By dimension we mean a non empty set endowed with an 
order relation. Most interesting situations from the decision standpoint will have 
several such dimensions, and so most decisions are based on multiple evaluations of 
the situation and alternative courses of action. 

2.1 Values for Choice 

In a similar line of reasoning, Simon has proposed the notion of aspiration level: 
“Aspirations have many dimensions: one can have aspirations for pleasant work, love, 
good food, travel, and many other things. For each dimension, expectations of the 
attainable define an aspiration level that is compared with the current level of 
achievement. (...) There is no simple mechanism for comparison between dimensions. 
In general, a large gain along one dimension is required to compensate for a small loss 
along another — hence the system’s net satisfactions are history-dependent, and it is 
difficult for people to balance compensatory effects.” 

Values can be said to constitute the motivational part of the agent’s mind. Different 
agents decide differently in the same situation because they base those decisions in 
different value systems. The same agent can decide differently in the same situation 
because its value system has evolved as a result of the interaction between the agent 
and the world. 

It is possible that the several evaluations do not all agree in selecting the same 
candidate for execution. Belonging to different dimensions it may not be easy to 
collapse these evaluations into a function that serialise the candidates. However, the 
agent must decide, so such a function is needed. It could then be argued that this 
function is the decision function, and is equivalent to a utility function. Apparently, in 
this case there would be no need for the multiple evaluations setting, they would all 
amount to components of the decision function. On the contrary, we defend the 
independent existence for the evaluations along the multiple dimensions, and for a 
number of reasons: new dimensions can be added or subtracted, or are not relevant, or 
are especially relevant in some situations; agents can communicate about them, 
exchanging experiences, adopting new measures or even dimensions from other 
agents; the evaluation functions along each dimension can evolve over time. The 
existence of such objects in mind allows for a greater flexibility in a number of 
situations than would be possible if all were collapsed into some kind of generalised 
utility function. There is also the argument that such a function does exist, as a kind of 
‘general currency’ in mind, one dimension that would serve as a conversion basis for 
all others. [18] proposes pleasure (and pain) for this role, which agrees with the 
behavioural theories of reinforcement and punishment. This proposal is questionable, 
but worth further thought. 
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2.2 Accounts of Value 

In this section we briefly examine different approaches in the literature to the concept 
of value. We will try to remain as close as possible to the multi-agent systems 
literature, so that the focus of the approaches is comparable to ours. A more detailed 
analysis can be found in [1]. 

The account that we found most comprehensive was that of Miceli and 
Castelfranchi [13, 14, 15]. They base their notion of value upon the motion of 
evaluation, which is a special kind of belief involving a goal. We almost interely 
subscribe the summary of [15], partially included herein: (i) there are in mind bridge- 
objects between goals (ends) and beliefs (assumptions): evaluations, values, or hybrid 
objects like emotions or attitudes; (ii) evaluations are inserted in the structuration and 
in the processing common to all the knowledge; (iii) there is a separability between 
the evaluator and the possessor of the goal; (iv) factual and evaluative elements are 
mingled in language statements; (v) there is a multitude of cognitive processes for 
evaluation; (vi) values exist in mind and are mental objects in their own right, with 
specific cognitive and social functions; (vii) values are unarguable, and irrational. 

However, we wouldn’t agree that values can be included in the class of beliefs, 
since they have a motivational power that one wouldn’t expect beliefs to have. Notice, 
that values are not goals either, since they can possess some informational content. We 
don’t look at a value as something to be pursued per se, for instance through the 
creation and achievement of the corresponding goal. A value is rather a primitive 
mental object, that cannot be reduced to beliefs or goals, or any combination of the 
two. Of course, values give rise to some beliefs, and this process can be better 
understood when those values are conscious, which might not be always the case. And 
they might give rise to some goals as well, the most important role that values play is 
that of parametrising the choices made by the agents. It is very rare the case when we 
know how to fulfil some value. Think of friendship, if one values friendship, what can 
he do to achieve it? It is more likely that this value be present when making concrete 
choices among courses of action. One would think: T’ll be more close to this value of 
friendship if I choose this action instead of that other.’ 

Miceli and Castelfranchi’ s work did not devote a lot of attention to the dynamic 
aspects of value processing. And the definitions they provide reflect this. They can be 
used as a starting point for experimental work, which can put to trial their coherence, 
and show new paths for development of this subject. They can lead the way to provide 
new, operative definitions. 

Not many authors have addressed the notion of value in an intelligent agents 
context, and even when it is addressed, it is rarely explicitely included in the agent 
architecture. Normally it is translated into some functionality of the system, or simply 
dropped. An example of this is the work of Kiss (see [17], chapter 9). He views the 
agent’s goals and values as attractors in the state-space that represents the (multi- 
agent) system. But goals and values are kept together, and no attempt is made at 
telling them apart. Even when describing the notion of potential field that allows for 
the definition of attractor. This potential is interpreted in agent-theoretic terms as a 
‘hedonic,’ or ‘satisfaction’ metric. But now he places this notion under the ‘affect’ 
category in agent theory. Anyway, we are close to a economical utility notion of 
preference (or value, for that matter), a one-dimensional metric over the state-space. 

In [5], Blandford characterises agent’s long-term values as goals “which guide its 
choice between alternative possible actions, but which are not achievable.” The agent 
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“has beliefs about how well particular types of action satisfy its values” (page 116). 
There is some confusion, the author cannot decide exactly where to use ‘goals’ and 
where to use ‘values.’ In fact, in finishes by using ‘ends.’ Our interpretation is that 
values are end-goals, and also meta-goals, and cannot be achieved. An example of 
non-achievable goal is that of “being rich and famous,” since “there is never a time 
when one is likely to say “OK, achieved that; now I’ll go and do something else!”” 
(page 120). These goals are delt with as values. We would characterise that goal not as 
‘non-achievable’ but as a ‘maintenance goal,’ one that needs to be first accomplished 
and afterwards maintained, and so, far from a value. 



2.3 Basic Mental Objects 

In the next section we present the BVG agent architecture. BVG stands for ‘Beliefs, 
Values and Goals,’ and is intentionally similar to BDI (Beliefs, Desires and 
Intentions). However, this new organisation gives a stronger support to the power of 
choice of an agent, during the decision stage. Before we take a deeper look into this 
architecture, let us examine these mental objects and explain why these were chosen 
for the basic building blocks of our agents. 

The intention here was to simplify as much as possible the architecture where we 
wanted to introduce the concept of value. So we started by taking beliefs, and grouped 
together the pro-active mental objects under the umbrella designation of goal. Of 
course there are several types of goals, and intentions are certainly among the most 
important and useful, whereas desires are usually allowed the flexibility and freedom 
of representing anything the agent might want, without further concerns. We aim at 
skipping these technical details, so that we concentrate on what will be new: the use of 
values to inform choice (see for example [25] and their inability to face this issue). 
Beliefs represent what the agent knows', goals represent what the agent wants', values 
will represent what the agent likes. 

Once this introduction of values accomplished, we can expand our architecture in 
ways that have been pointed out by BDI research, or others. The mechanism of goals 
is still in need of further investigations, and depends so much upon architectural 
details [8]. Even beliefs, that seem apparently simpler, pose all sorts of difficult 
questions still to be addressed [6, 19]. We will take the general classes of 
informational and motivational attitudes (represented by beliefs and goals), and leave 
their specialisation until later. 

We can subscribe almost entirely the architectural options in [25]. However, we 
kept desires out of the picture for different reasons. We believe that the sources of the 
agent’s goals can come from suitably informed beliefs. It will be a further role of 
values to help select among the believed things which ones to pursue. But this role of 
values in deliberation will be left for further research as we examine the choice 
process. 



2.4 Adaptation over Time 

In section 2. 1 we mentioned that the value system of an agent could evolve over time. 
In this section we will address the ways in which this can be done. 
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When an agent makes a decision, it will monitor the features of the world that are 
related with the consequences of that decision. In particular, it will check whether the 
action was successful or not. The agent updates its belief and goal structure according 
to what has happened as a result of its action. And the information concerning that 
decision and its consequences will be used to enhance the decision system for future 
situations. 

The agent has to feedback the relevant information and use it to readjust its value 
system. This means that the agent must evaluate the quality of its previous evaluation, 
the one that lead to the decision. We can put the question of what information to 
feedback and how to use it. This mechanism is depicted in figure 1. 




Before we propose alternative ways to tackle this issue, let us note that again the 
evaluation of the decisions quality could serve as a candidate for the general decision 
function. That is, if this function constitutes the ultimate assessment of the quality of 
the decisions, why not use it for decision itself? The answer is that none of these 
functions is expected to be optimal, and so it is a better option to let all of them 
operate freely to enhance its contribution to the general behaviour. It could even be the 
case that the optimal behaviour does not exist, and there is no sense in looking for the 
perfect decision function. The world is too open and unpredictable for that to be 
possible to obtain. Furthermore, as we shall see, it could be just as difficult to use this 
feedbacked information as is to solve the decision problem itself. 

We propose, as a way out, three alternative ways of assessing the quality of the 
decision. The first is to consider some measure of ‘goodness’ of the decision, that will 
assess to what extent the action was successful. The simpler case is to test for success 
or failure, in a more promising approach, the action is evaluated and some (say, 
numerical, a real number belonging to [0, 1]) grade is assigned to the decision and 
corresponding action. This was the option we used in [2], where we designed a game 
strategy that used the assessment of our opponent’s move to dynamically adapt our 
estimation of her strength, in order to allow for exploitation of her possible 
weaknesses. 

The second alternative is to observe in the world the consequences of the selected 
action in terms of the same dimensions that the agent used for decision. For instance, 
if the agent used time and money as the key dimensions upon which the decision was 
based, then it should observe what the consequences were of that decision in terms of 
time and money themselves. This should ease up the problem of updating the system 
of values, since there are no mediators to consider between the evaluation and its 
assessment. 

Finally, we propose to make the assessment of the decision by measuring its results 
in terms of the set of dimensions that the designer of the agent is interested in. When 
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we design an agent, we are interested in obtaining some behaviour, and we can 
postulate some dimensions and corresponding measures to evaluate the quality of the 
behaviour we obtain to compare it with the desired one. But it is not necessarily the 
case that when designing the agent’s decision machinery, we use those exact 
dimensions and measures we had in mind. To evaluate the agents performance with 
our own measures (that they did not have access to) amounts to look for emergent 
behaviour. 

It is not obvious how to use this feedbacked information to enhance decision- 
making, as it is not obvious how to design the decision mechanism in the first place. 
We propose to conduct experiments with the mechanisms we have just described, to 
gain insight into these problems. Our conjecture is that a lot of ad hoc design options 
must be made in each concrete decision setting. However, some structural features can 
be obtained, as we shall see in section 4. 




Fig. 2. The BVG architecture 



3 The BVG Agent Architecture 

The BVG architecture puts in evidence the role of choice in the behaviour of the 
agents (see fig. 2). As we have seen, the agent is kept as simple as possible in what 
concerns its basic blocks. A structure of goals and beliefs is completed by values, 
which will be used by the choice mechanism to obtain a decision. These values are not 
rigid, they can and will be changed during the interaction. 



3.1 The Use of Values 

The agent operates as follows. In each cycle of operation, the agent’s beliefs and 
evaluations are updated. This is similar to the function ‘next state’ in [25], which is 
thought of as a belief revision function. It determines the new set of beliefs of the 
agent, taking as basis the present set of beliefs and also the state of the world. This 
new set of beliefs includes any new information that origins from the agent’s 
perceptions. 

Afterwards, deliberation occurs, during which the agent computes the decision to 
be taken and the corresponding action to be executed. The selected action is then 
executed, and the world is updated accordingly. The agent notes these changes in the 
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world in the immediate cycle. A more asynchronous scheme would certainly be more 
realistic, allowing for pre-emptive processes of reasoning, but for the moment, this 
scheme suffices to gain insight into the problem at hand. 

In [19], the choice machinery is embedded in the deliberation process. The act and 
reasoning cycles are intertwined in order to maintain the intention structure (a time- 
ordered set of tree- structured plans). The choice machinery is associated with the 
reasoning cycle, where potential additions to the intention structure are considered by 
the filtering and deliberation processes (using a kind of aggregated utility function). 
Also, a means-ends reasoning (a special purpose route-planner) can be performed to 
produce new options. 

Now we focus on our choice mechanism. Values are used to relate goals with 
beliefs. Each goal is characterised by a set of values that work as standards (something 
like Simon’s aspiration levels) against which candidate sub-goals will be matched. 

Goal ((V,,v, ),..., (V„,vj)g (1) 

Plans are represented as beliefs. The values that inform these beliefs represent the 
characterisations the agent associates with each of the alternative plans. For instance, 
we could have: 



Bel ((V„vJ, . 


v,„) s, ->g 


(2) 


Bel ((V„vJ, . 


■■.(V„,vjs,->g 


(3) 



We have now the conditions to make our choice. For that purpose, we need a 
calculus that collapses the several measures over the values into a function that picks 
up the best option. For instance, a calculus based on [24] would pick Sj such that i is 
the minimum that satisfies j, Vj^ • v^. In general, we will have a choice function F that 
will take g and s^ as arguments and assign each S; a real number that will allow for the 
serialisation of the candidates. The chosen candidate will be proposed as a new goal 
for execution, and so it needs to be assigned its own standards. 

Goal ((V=F„(v,, V.,)), ...)s. (4) 

As an example, consider that our value is a deadline for the achievement of goal 
g. Vjj would be the amount of time the alternative S; is believed to spend. So we should 
take F„(x, y)= x-y, so that the sub-goal S; has Vj-v^ as deadline. 



3.2 The Insertion of a BVG Agent into Its World 

Now we insert the agent in its world according to this previous scheme. The feedback 
of the previous choice and corresponding action is made by looking at the appropriate 
features in the world. In a more elaborate architecture, this mechanism would be 
driven by expectations, by now we will use the appropriately chosen beliefs, and 
handle this information in the decision process. 

When the decision is made, it gives rise to a new sub-goal to be fulfilled. The 
decision mechanism assigns the corresponding values for the sub-goal standards, and 
keeps track of the performance of its execution. When the goal is tentatively fulfilled, 
the quality of this execution is assessed (in our present example, we only check for 
success or failure) and the standards are updated in the super-goal. This means that 
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when some goal is executed, the quality of this execution reflects not on itself, but on 
the choice that led to it. If the subordinate fails, we blame the one that wrongly chose 
this candidate over the others. 

The values that characterise our plans can be updated as well. If some alternative 
did not perform as we expected, the features that lead to it being chosen should be 
revised. 

It may seem that one of the two revisions could be enough to reflect the 
information observed into the system of values. But a short example will show 
otherwise. Imagine we have 

Goal (..., (Time, 100), ...) g (5) 

And that the alternative goal gl was chosen among the candidates: 

Bel (...(Probability of success, 0.8), (Time, 10), ...) g, -> g (6) 

Bel (..., (Time, 20), ...) gj-> g (7) 

If gj fails we should not only update its own description as a plan (e.g., Bel 
(...(Probability of success, 0.7), (Time, 10), ...) g, -> g) but we should also update g: 
Goal (..., (Time, 90), ...) g, since gj has spent 10 time units. 

So we have r(Sj) Results (the set where the assessment of the decisions takes 
values), the evaluation of the execution of the selected plan. To update our system of 
values, we need a function G that takes Vj,, ..., v^,, and r(Sj) as arguments and returns 
Vjj’, ..., V;^’, the new characterisation of s^. G also acts over g, the goal ‘responsible’ 
for choosing its sub-goal Sj. 



4 A Case Study 

For demonstration and application purposes, we consider the following scenario. 



4.1 A Disturbing Situation 

Imagine you continuously receive E-mail messages containing attached files that 
demand for some special treatment: uncompressing, translating, converting, whatever. 
For some reason, your mailer does not do that automatically, as you would expect. 
Possibly, your computer does not contain the necessary software. You usually solve 
that problem by forwarding those messages to someone that takes care of them and 
prints their contents. Every now and then, the person that does that for you is not 
available, so you have to send back the message and ask for delivery in another 
format, hoping that this new format is finally decipherable. After several rounds of 
unsuccessful attempts, you finally loose it, and angrily demand that everything is sent 
to you in plain ascii, and wait for better days. Luckily, your mailer never collapsed 
during the whole process (yet?). 

What we aim to do is to represent the decisional challenge posed by such a 
disturbing process. The behaviour just described is arguably rational. We do not intend 
to produce a rule that implements such a behaviour. Rather, we want to describe a 
piece of an agent’s mind (this agent could be the agent that handles your E-mail for 




Decisions Based upon Multiple Values: The BVG Agent Architecture 307 



you) that can decide not only what to do in every situation that appears during the 
process, but does that for about the same reasons that any of us would consider. In 
particular it should be aware of why it takes its decisions and be prepared to change 
those decisions if the situation asks for such change. We want to represent the moods 
of the agent that is called for decision, and how those moods influence its decision- 
making. Of course in this particular example, we will have to make simplifications. 



4.2 Description of the Decision Problem 

We will consider a main goal, Print(Message), and the options the agent will face are 
candidate ways of fulfilling that goal. We will represent these as beliefs that stand for 
plans in the agent’s mind. The simplest way would be to Click(Print), another way 
would be to forward the task to another, more capable agent, AskForHelp(Someone, 
Print(Message)). Finally, you try to solve the problem by asking the sender for the 
message in a new format, by invoking the method GetNewMessage(Message). Let us 
assume for the time being that when any one of these candidate sub-goals would be 
executed, somehow the agent has access to the fact that they were successful or not. 

Now, let us consider the relevant evaluation features for our agent. We will base 
our decisions on measures taken along four different dimensions, the so-called 
relevant values for this situation: probability of success (Vj), patience consumption 
(V^), expected time delay (Vj), and independence of help by external sources (V^). As 
you would expect we would prefer to pick a solution that maximises the probability of 
success and independence of external help, while minimising patience consumption 
and time delay. We could in some way try to relate these four values, but we will keep 
them independent for the sake of the experiment as we designed it. 

The standards for candidate goals to be matched against are related to the particular 
situation the agent is facing. For the sake of simplicity (to minimise the number of 
beliefs that describe the situation, and to keep the experiment manageable) we chose 
to associate those standards with the goals involved with the situation, and to update 
only the characterisation of the plans we choose. So, the four numbers associated with 
the main goal in the initial situation represent the levels of exigency the agent has 
concerning that goal. Vj=0.6 ( [0, 1]) means that the agent will pick no plan whose 

probability of success is known to be less than 0.6; V 2 =l ( [-1, 1]) means the agent 

has a capital of patience equal to one, the maximum possible; V3=1000 ( N) means 
the agent wants the goal to be fulfilled before 1000 minutes pass; V^=l ( [-1, 1]) 

means that the agent prefers a plan that does not require help from others. 

For the beliefs (that represent the plans for achieving the goal), the four values 
represent the expectations or consumptions of each candidate for execution. For 
instance, Bel Click(Print) Print(Message) means that the expectation 

((V,,2/3),(V2,-0.1),(V3,1),(V4,0)) 

of success of this plan is 2/3, the consumption of patience is 0.1, the expected time for 
execution is 1 minute, and the independence from other agents is total, represented by 
0 . 

In what concerns the probability of success, we will assume that the agent does not 
possess any initial information. If the agent did possess information we could choose 
from two alternative probabilistic settings: either a frequencist perspective if we had 
any experimentation or prior simulation, or a bayesian perspective, where the designer 
would express her convictions through an adequate a priori probabilistic distribution 
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[4], In the present conditions, the only way to update the attitude of the agent about 
probability is to use a subjectivist perspective of the concept of probability. Initially 
we use the principle of insufficient reason to assign a 1/2 probability to each candidate 
sub-goal (the same probability for success and failure). The update of this probability 
will be made through the Laplace’s rule: the probability of success will be (r-i-l)/(n-i-2), 
where r is the number of successes so far, and n is the number of attempts so far. This 
way, when the agent ranges through the possible actions, it will take into account 
different values for the probability of success resulting from possible revisions. 

So the initial situation is represented by the following sentences: 

G: Goal Print(Message) (8) 

«V,.0),(V2,1),(V3,1000).(V4,1)) 

BLBel Click(Print) Print(Msg) (9) 

((V,,1/2),(V2,-0.1),(V3,-1),(V4,0)) 

B2: Bel AskForHelp(Someone,Print(Msg)) Print(Msg) (10) 

((V,,1/2),(V2,-0.3),(V3,-20),(V4,-0.5)) 

B3: Bel GetNewMsg(Msg) Print(Msg) (11) 

((V,,1/2),(V2,0),(V3,-200),(V4,-1)) 

For choice function F we picked up a linear combination of the parameters 
involved, which we will try to maximise. We determined empirically an acceptable 
combination of weights for this function, and arrived at 

F(v,,V2,V3,V4,v„,v.2,v,3,vJ =0.5(v,+v„)+0.4(v3+v.3)+0.001 (v3+vJ+0.099(v4+v.4) (12) 

For the update function G, we chose to feedback only success or failure, 
represented respectively by 1 and -1. We arrived promptly at the conclusion that we 
lacked information with which to update the plans characterisation. For instance, how 
do we update time expectations when we don’t even know if the plan failed due to a 
deadline arriving or if it failed tout cour. Even so, for demonstration purposes, we 
decided to keep things simple and leave alone values and V^. We only update Vj 
and Vj, in the following way (for Sp the chosen sub-goal): 

G1(r(s),v„,v.2,V|3,v.4)= (s-i-r(s))/(n-i-1), where v„=s/n (Laplace’s rule) (13) 

G2(r(s),y„V3,y3,vJ= power(2, -rjsj)^ (14) 

Let us look at the execution of the agent in a concrete example, before we conduct 
more exhaustive experiments, by simulating a certain random number of successes 
and failures in the several possible cases. In this example, shown in table 1, we present 
the case when nothing works, that is, we will look at the evolution of the agent’s 
choices through a series of failures. 



Table 1. Agent’s choices when everything fails. 



Iteration 
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7 


10 


Choice 


B1 


B1 


B2 


B1 


B3 


B3 


B2 


B3 


B1 


Resuit 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


-1 


New V, 


1/3 


1/4 


1/3 


1/5 


1/3 


1/4 


1/4 


1/5 


1/6 


New V 2 


-0.2 


-0.4 


-0.6 


-0.8 
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0 


-1 


0 


-1 
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In terms of our description, the agent tries the simplest solution twice, before he 
decides to ask for help. When this fails, he still tries to solve things himself before 
choosing the more demanding option of sending everything back. When this fails 
twice, he tries to get local help again. After that, he send things back a couple of 
times, before he dares to try again to do things himself. 

4.3 Results 

Now we look at some runs of our simulation, and observe our agent’s behaviour. We 
skip the case when everything goes well, since success only reinforces the choice 
made. Table 2 summarises a run of choices made when probability of success (of any 
alternative) is 0.5. 



Iteration 
Choice 
Resuit 
New V, 

New 

We notice that even a small number of initial successes is sufficient to such 
reinforcement to an already good option that even after a big number a failures the 
agent still picks this option up. If this is undesirable (which depends on the designer’s 
options) we could try to improve the structure of the choice function (for instance, by 
using a non-linear function, see again [2] for examples), or to enhance the information 
we feedback (see subsection 2.4). In other runs with the same probability of success, 
the agent did not stick so much to the same option. 

Table 3 shows results for a simulation with 25% probability of failure. Again a 
series of successes determines a lot of positive reinforcement for the option that 
originated them. In this case, the initial failures of alternative B 1 determine the choice 
of B2, and then the low percentage of failures assures that the agent sticks to B2. 



Iteration 
Choice 
Result 
New V, 

New Vj 

A greater probability of failure would drive our agent into a greater variability in 
his choices. The results would then tend to be more close to the ones shown in table 1 . 
So, in the last simulation we show (table 4), we present the case when the probability 
of success is not fixed, but varies with time. In this particular experiment, we ‘let’ the 
agent be correct in his estimate of the probability of success. This means that we 
simulate the result of the agent’s tentative execution by using his own belief of what 
the probability of success for the option he has chosen. 

A few dozen runs of the experiment show that the results in table 4 are typical. The 
probability of success starts by being 0.5, varies for a while, but quickly tends to 
stabilise around that value. So the results are quite like those of table 2: sooner or later 
some option is chosen and gets a series of positive responses from the world. By then 
it has gained a substantial advantage over the others, and is only passed over 
occasionally. 



Table 3. Agent’s choices when goals fail 25% of the times. 
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-0.019 


-0.038 


-0.019 


-0.038 



Table 2. Agent’s choices when goals fail half the times. 
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Table 4. Agent’s estimates are right about success and failure. 

Iteration 
Choice 
Result 
New V, 

New Vj 

5 Concluding Remarks 

We addressed the problem of choice at the light shed by the introduction of multiple 
values to assess a decision situation. The agent’s choice machinery becomes more 
clear, as agents express their preferences through the use of this multiple value 
framework. Choice is performed by collapsing the various assessments into a choice 
function, that cannot be considered equivalent to a utility function, since it is 
computed in execution time. Moreover, by feedbacking assessments of the quality of 
the previous decision into the agent’s decision process, we gain in adaptation 
capabilities. Our agents’ decisions no longer depend solely on the past events as 
known at design time. Instead, events are incorporated into the decision machinery as 
time passes, and the components of those processes evolve continuously to be 
aggregated just when a decision is needed. This is a step towards real autonomy. 

We could only show limited application of these ideas. Other experiments were 
made, such as the use of this framework to expand on Axelrod’s ‘model of tributes’ 
[3]. We also developed a formalism for the description of decision situations, 
formalism which was partially used in this paper, although it allows for more general 
situations than the ones contained here. Future research will address the increase of 
the agent’s adaptability by enhancing the use of the feedbacked information. Another 
issue is the expansion of an agent’s system of values as a result of interaction with 
other agents. 
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Abstract. Diagnostic problem solving aims to explain an observed di- 
vergence from the proper functioning of some case, human or other. The 
paper presents a temporal-abductive framework for diagnostic problem 
solving focusing on the integration of time. It is argued that time can 
be intrinsically relevant to diagnostic reasoning and as such it should be 
treated as an integral aspect of the knowledge and reasoning of a dia- 
gnostic problem solver. The proposal for achieving this is to model all 
relevant concepts as time-objects. 

1 Significance of Time 

Consider the following simplified description of the dysmorphic syndrome Mor- 
quio: 

Morquio presents from the age of 1 year and persists throughout the lifetime of the 
patient. People suffering from Morquio can exhibit the following: short trunk; sloping 
acetabulae; generalised platyspondyly, from the age of 1 year, thoraco-lumbar kyphosis 
from the age of f years', and progressive resorption of the femoral-capital epiphyses from 
the age of 2 years onwards-, more specihcally flatness of the femoral-capital epiphyses 
appears at the age of 2 years and persists up to an age between 8 and 15 years, and 
from then onwards the ossihcation of femoral-capital epiphyses is absent. 

The emphasized text refers to time. The references are absolute, where oc- 
currences are specified with respect to some (generic) fixed time-point, which 
here is birth. One of the manifestations expresses a temporal trend, “progressive 
resorption of femoral-capital epiphyses” which starts at the age of 2 years. At a 
finer level of description the trend is divided into two meeting phases, a phase of 
flatness and a phase of absence in ossification. The exact meeting point between 
the two phases (or change point from flatness to absence) is uncertain. It can be 
at any time between the ages of 8 and 15 years. Thus the earliest termination 
(initiation) of the first (second) phase is 8 years of age and its latest termination 
(initiation) is 15 years of age. Manifestations like these, in which time constitutes 
an integral aspect are time-objects, associations between properties and existen- 
ces, e.g., ( platyspondyly, from-the-age-of- 1-year ), where “platyspondyly” is the 
property and “from the age of 1 year” the existence. The above description of 
Morquio gives the overall model for this disorder. Such models need to be (tem- 
porally) adapted to the case under consideration. For example Morquio presents 
a different picture for an one year old, a three year old, or a seventeen year old. 
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In this domain, case (patient) data are largely obtained from radiographs 
that give discrete snapshots of the development of the patient’s skeleton. For 
example consider the following data on some patient: 

Carpal-bones small at the age of 10 years-, femoral-capital-epiphyses abnormal at 
the age of 2 years-, femoral-capital-epiphyses flat and irregular at the age of 7 years-, 
vertebral-end-plates irregular at the age of 7 years. 

The case information is point-based in contrast to the (medical) knowledge 
which is largely interval-based. A competent, knowledge-based, diagnostic sy- 
stem must be able to derive (temporal) abstractions from the given information, 
which fill in the gaps and can be directly matched against the model of a dis- 
order, for a patient of that age, thus, for example, concluding Morquio as the 
explanation of the abnormal observations for the above patient. 

Although much research work is reported in abductive diagnostic reasoning 
e.g., 12131511611712(11 . etc., relatively little is reported in temporal-aJodnctive dia- 
gnosis I4I6I7I18I15I231 . In these approaches the emphasis is on incorporating 
temporal constraints on causal relations I4TT51 . where diagnostic knowledge is 
modelled in terms of a single causal network, or on incorporating temporal con- 
straints on the temporal extents of occurrences wm . where each process is 
modelled separately through its own temporal graph or state description model. 
Temporal uncertainty and incompleteness is recognized as a necessary represen- 
tation aspect, which is expressed either in an absolute way (ranges for delays, 
durations, or temporal extents), or a relative way (disjunction of temporal rela- 
tions between two intervals). 

In all these approaches, occurrences are treated as indivisible entities. Moreo- 
ver, such occurrences are not treated as dynamic entities, embodying time as an 
integral aspect, and interacting with each other; time is loosely associated with 
them by pairing an atemporal entity with some time interval, e.g., by including 
the interval as yet another argument of the relevant predicates. Recurring phe- 
nomena and periodicity in general, as well as temporal trends are not addressed, 
since these require compound occurrences. 

In our approach the integration of time is achieved by modelling the implica- 
ted concepts (failures, faults and therapeutic actions) as time-objects. In addition 
compound occurrences are supported. In this paper we present work, building 
from our previous efforts in temporal abductive diagnosis . In particular we 
present a temporal classification of failures and faults and propose the modelling 
of failures in terms of causal networks of time-objects. 

The rest of this paper is organized as follows. Section [2] gives a global view 
of diagnostic reasoning from the temporal abductive perspective. Section [31 over- 
views the adopted temporal ontology and section (31 discusses models for failures, 
faults, and therapeutic actions in terms of time-objects. Section [3] outlines me- 
chanisms for the formation of potential diagnostic solutions and section ^defines 
predicates accounts-for and in-conflict-with, which form the basis for the eva- 
luation of diagnostic solutions. Finally section [7] concludes the discussion. 
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2 Diagnostic Reasoning 

2.1 Application Context, Diagnostic Theory, and Case Histories 

The application context (AC), delineates the extent of the diagnostician’s “ex- 
pertise” or competence and thus the scope of its diagnostic theory. The role of 
the application context is in recognizing whether a particular problem is within, 
at the periphery, or outside the diagnostician’s expertise, prior to attempting to 
solve the problem. Generally speaking the application context specifies the types 
of cases, (e.g. human) and the types of failure addressed (e.g. single or multiple 
failures and of what sort). 

The diagnostic theory {DT) constitutes the knowledge of the diagnostic sy- 
stem. In this paper we are interested in temporal-abductive diagnostic theories, 
i.e. theories with explicit notions of time whose purpose is to best explain (ac- 
count for) abnormal situations. In this respect a central component of the theory 
is the set of temporal models for the distinct failures covered by the application 
context. In addition a diagnostic theory includes background knowledge. To draw 
a comparison with the Theorist framework mm , the failure models correspond 
to conjectures (abnormality assumptions that are only considered if there is evi- 
dence suggesting them), while background knowledge comprises both defaults, 
normality assumptions which are assumed to hold unless there is evidence to the 
contrary (e.g. normal evolution of ossification processes), and facts (e.g. anato- 
mical knowledge). Finally the background part of a diagnostic theory includes 
models of therapeutic (or other) actions of relevance to the covered failures from 
a diagnostic perspective. 

A case history {CH) gives factual information on an actual case. It is a tem- 
poral (historical) database on the case. A case history is therefore continuously 
updated with new information on that case. The unaccounted observations (of 
misbehaviour) that constitute a diagnostic problem are also part of the case 
history. 



2.2 Temporal-Abductive Diagnosis 

A case’s history consists of temporal assertions. A temporal assertion, {p,vt), is 
an association between some property, p, and an interval of valid time, vt, with 
respect to the real time-axis. It means that at the current point of time it is belie- 
ved that property p holds (is valid), over time interval vt, for the particular case. 
Thus let CHt be a case history at time t and let St be a potentially abducible dia- 
gnostic solution for the particular case at time t, i.e., DTU CHt — >■ St,where^ 
stands for “suggests” not logical implication since the inference is abductive and 
thus plausible in nature. If St = {(“■/, t) | / S failures under AC}, i.e. none 
of the covered failures is believed to hold at time t, the case is assumed to be 
functioning normally at time t (with respect to AC) and St would be unique. 
Otherwise the case is malfunctioning and St is an explanation. If no failure can 
be established to hold at time t, although there are observations of ongoing 
abnormality, it is possible that a transient failure has caused a persistent fault 
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1^. The adopted temporal ontology (see section enables the modelling of a 
transient or persistent failure, both of which can cause a transient or persistent 
fault. Thus, like the case history, a potential diagnostic solution at time t, St, 
consists of temporal assertions, say (f,vt). This means that at time point t, in 
the context of the given potential diagnostic solution, it is assumed that the 
particular case has failure / during period vt. Depending on the relation of the 
valid time interval, vt, to the present point in time, the hypothesised failure, /, 
is a past or an ongoing one. Thus St consists of assertions (or more accurately 
assumptions) of past and/or ongoing failures that explain observations of (past 
and/or ongoing) abnormality included in the case history. Each potentially ab- 
ducible diagnostic solution at time t, Si^t,i = constitutes a hypothetical 

extension of the case history, CHt- Only a “confirmed” (believed) diagnostic so- 
lution can become part of the case history. Assertions about actual therapeutic 
actions, performed on the case, are directly entered into the case history. 

3 Time Ontology 

The principal primitives of the adopted time ontology are the time-axis and 
the time-object which respectively provide a model of time | 12| and a model of 
occurrences | 8I11| . 

A time-axis, a, represents a period of valid time from a given conceptual 
perspective. It is expressed discretely as a sequence of time- values, Times{a) = 
{t\,t 2 , --Tti, ..,tn\, relative to some origin. Time-axes are of two types, atomic 
axes and spanning axes. An atomic axis has a single granularity (time-unit), 
that defines the distance between successive pairs of time-values on the axis. Its 
time- values are expressed as integers. A spanning axis spans a chain of other 
time-axes. It has a hybrid granularity formed from the granularities of its com- 
ponents, and its time- values, also inherited from its components, are tuples | 12| . 
Some application can involve a single atomic axis and a single granularity, while 
another application a collection of time-axes and multiple granularities, where 
the same period can be modelled from different conceptual perspectives. Relevant 
time-axes for the domain of skeletal dysplasias could be fetal-period, infancy, 
childhood, puberty and maturity, the latter four collectively forming a spanning 
axis of lifetime. Similarly childhood could be a spanning axis, decomposed into 
early, mid and late childhood. The others could be atomic axes, where the gra- 
nularity for fetal-period and infancy could be months, while for maturity years, 
etc. If the origin of all these axes is birth, the time-values for fetal-period would 
be {-10, -9, ..., 0}, a negative value denoting a time before the origin, which is 
denoted by 0. These are general, or abstract, time-axes whose origin is a generic 
time-point. Such time-axes can be instantiated for specific cases by binding their 
abstract origin to an actual time point, thus becoming concrete time-axes. The 
distinction between abstract and concrete, which applies to time-objects as well, 
is important; a diagnostic theory is expressed at an abstract level, a case hi- 
story at a concrete level. For brevity, in the following discussion we assume that 
a single abstract atomic axis is used in the definition of the diagnostic theory. 
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This is instantiated to provide the concrete time axis for the definition of the 
particular case history. 

A time-object is a dynamic entity for which time constitutes an integral as- 
pect. It is an association between a property and an existence. The manifestations 
of Morquio are examples of time-objects, e.g., ( thoraco-lumbar kyphosis, from 
the age of 4 years ), ( short-trunk, from the age of 1 year ), etc. The notion 
of a time-object enables the definition of different types of occurrences, such 
as simple (atomic) occurrences, or compound occurrences (such as trend occur- 
rences, periodic occurrences, or any other pattern of simpler occurrences). To 
be able to express such occurrences, the ontology of time-objects includes three 
types of relations between time-objects: temporal relations which are adapted 
and extended from Allen’s set [T], structural relations, enabling composition and 
decomposition, and causal relations. 

Time-objects, like time-axes, are either abstract or concrete. Disorder models 
consist of abstract time-objects, while case histories of concrete time-objects. 
The existence of abstract/concrete time-objects is given with respect to ab- 
stract/concrete time-objects. Given the multiplicity of time-axes, formally a 
time-object, r, is defined as a pair (7rr,er) where tt,- is the property of r and 
er its existence function. The existence function e is in fact a two-parameter 
function, e(r, a); simply denotes the partial parameterization of the func- 
tion with respect to r. Similarly tt,- denotes the (full) parameterization of the 
single parameter function 7r(r). The time-axis that provides the most appro- 
priate conceptual context for expressing the existence of t is referred to as the 
main time-axis for t and the expression of r’s existence with respect to its main 
time-axis, as its base existence. The existence function, e,-, maps the base exi- 
stence of T to other conceptual contexts (time-axes). A time-object has a valid 
existence on some time-axis iff the granularity of the time-axis is meaningful to 
the property of the time-object (see below) and the span of time modelled by 
the time-axis covers (possibly partially) the base existence of the time-object. 
If time-object r does not have a valid existence in the context of time-axis a, 
Crio:) =T (the time-object is undefined with respect to the particular temporal 
context). If time-object r has a valid existence on some time-axis a, its existence 
on a, er(a), is given as: 

er{a) = (ts,t/,k) where ts,tf G Times{a); 

ts ^ f/i &nd C G {closed, open, open-from-lef t , open-f rom-right , moving}. 

Time- values G and tf respectively give the (earliest) start and (latest) finish 
of the time-object on a. The third element of the existence expression, gives 
the status of t on a. If the status is closed the existence of the time-object 
and hence its duration, is fixed. Otherwise the status denotes openness (i.e. 
vagueness) on the one or both ends of the existence. In the case of openness at 
the start, tg gives the earliest possible start, while function le-fr,-(a) gives the 
latest possible start. Similarly in the case of openness at the finish, tf gives the 
latest possible finish, while function ri-fiv(o;) gives the earliest possible finish. 
The function names le-fr and ri-fr are acronyms for ‘left freedom’ and ‘right 
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freedom’. Thus the duration of a non-closed existence of a time-object can only 
be shortened. 

Hence a time-object can exist as a point-object on some time-axis but as an 
interval-object on another time-axis. In the former case the temporal extent of 
the time-object is less than the time-unit of the particular time-axis. If a time- 
object is a point-object under some time-axis, it is treated as an indivisible (non 
decomposable) entity under that time-axis. A special moving time-object is now 
which exists as a point-object on any relevant concrete time-axis and functions 
to partition (concrete) time-objects into past, future, or ongoing. 

The structural relations between time-objects are isa-component-of , and its 
inverse contains, and variant-component, and its inverse variant-contains; the 
latter two express conditional containment of optional components: 

Axiom 1: contains(ri, Tj) variant-contains(Ti, Tj, c)A conds-hold(c). 

A variant component can only be assumed in some case if the specified condi- 
tions are satisfied. For example, aspects of the ossification processes for carpals, 
radius and tarsal epiphyses differ between boys and girls. These distinctions 
can be conviniently modelled through variant components of the particular os- 
sification processes. A compound time-object has a valid existence under any 
time-axis in which at least one of its components has a valid existence, and a 
component time-object exists within the one that contains it. Temporal views 
of a compound time-object, from the perspective of specific temporal contexts, 
can thus be defined. Trends and periodic occurrences are modelled as compound 
time-objects [T0| . 

Causality is a central relationship in diagnostic problem solving. The onto- 
logy of time-objects includes relations causes, causality-link, and cause-spec, 
which are defined at the level of abstract time-objects, concrete time-objects and 
(abstract) properties, respectively m- Relation causes(ri, r^, cs, c/), where Ti 
and Tj are (abstract) time-objects, cs a set of (temporal or other) constraints and 
cf a certainty factor, is used in the following axiom for deriving a causality-link 
between a pair of concrete instances of Ti and Tj. A general constraint, which 
always needs to be satisfied, is that a potential effect cannot precede its potential 
cause: 

Axiom 2: causality-link(ri, r^, c/) <J= 

causes(Ti, Tj, cs, c/)A COnds-hold (cs) A“>starts-before(ri, Tj). 

Even if all the specified conditions are satisfied, by some case, still it may not 
be definite that the causality-link actually exists. This is what the certainty 
factor denotes. The uncertainty is due to knowledge incompleteness. 

Properties, which constitute the other half of time-objects, are atomic or 
compound (negations, disjunctions, or conjunctions), passive or active, and some 
are time-invariant. Examples of properties are “sex male”, “sore throat”, “severe 
coughing”, “removal of tonsils”, etc. Properties have explicit temporal attributes. 
A property is associated with relevant granularities, e.g. “headache present” is 
associated with hours and days, but probably not months or years. This way 
the time-axes meaningful to a property can be defined. A property, p, either has 
an infinite persistence, infper(p), or a finite persistence, finper(p). In the latter 
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case the following are additionally specified: whether the property can recur; 
maximum and minimum durations {max-dur, min-dur) under any of the relevant 
granularities, independently of any context in which they may be instantiated, 
where the default is to persist indefinitely. A default margin for the initiation of 
any instantiation of the property, under a relevant time-axis {earliest-init, latest- 
init), which if not specified is assumed to be the entire extent of the particular 
time-axis, is also included in the temporal attributes of properties. For example, 
“Morquio present” is an infinitely persistent property whose earliest-init is one 
year of age. On the other hand “flu present” is a finitely persistent, recurring 
property, and “chicken pox present” is a finitely persistent, but normally not a 
recurring property. In addition, the ontology adopts the semantic attributes of 
properties specified by Shoham, e.g. downward hereditary, upward hereditary, 
solid, etc. pTj . 

Relation cause-spec between properties has six arguments, where the first 
two are properties, the third a granularity, the fourth and fifth sets of relative 
(TRel) and absolute (Css) temporal constraints respectively and the last one a 
certainty factor. This relation also enables the derivation of a causality-link 
between a pair of time-objects: 

Axiom 3: causality-link(ri, r^-, c/) 

c3jise-spec{pi, pj, TRel, Css, cf) A 7r(ri) = pi A 7r(rj) = pj A 

r-satisfied(rj, Tj, /i, Ti?eZ) A a-satisfied(Ti, r^ , Css) A -istarts-before(rj. 

Other property relations include exclusion, necessitation, etc. The imple- 
mentation of the time ontology in terms of meta, abstract and concrete layers 
is discussed in [12|. This includes a declarative assertion language, combining 
object-oriented, functional and logical features, for the expression of the various 
axioms. 

4 Modelling Failures, Faults, and Actions as 
Time-Objects 

A specific diagnostic activity operates within a (possibly moving) window of real 
time which at any instant of time gives the past and future period of interest. 
This is the concrete time-axis. The relevant history of a case is that which is 
covered by the concrete time-axis. 

4.1 Failure Model 

An abductive diagnostic theory primarily contains failure models for all the 
known failures covered by the application context. Again for the sake of sim- 
plicity we assume that a single, abstract, time-axis is used in the definition of 
some failure model. A necessary condition for the instantiation of a failure model 
(with respect to a case) is that the model’s abstract time-axis can be mapped 
onto the concrete time-axis. 

Typically a failure is a non-observable malfunction whose presence in some 
situation is detected through its observable manifestations, its associated faults. 
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We classify failures and faults as follows from the temporal perspective: (a) infini- 
tely persistent, either with a fixed or a variable initiation margin (e.g. Morquio); 
(b) finitely persistent, but not recurring, again either with a fixed or a variable 
initiation margin (e.g. chicken pox) ; and (c) finitely persistent which can recur 
(here the initiation margin is variable), e.g. flu. The temporal extent of a finite 
persistence is either indefinite or bounded (through minimum and maximum 
durations) . 

A typical model of some failure is an (acyclic) causal structure comprising 
a number of causal paths, emanating from the node denoting the failure and 
terminating at nodes denoting (usually observable) faults. Intermediate nodes 
on such paths denote internal (usually unobservable) causal states; these are also 
temporally classified as explained for failures/faults. Such a causal structure is 
naturally expressed as a collection of abstract time-objects; each node corre- 
sponds to a time-object and each arc to the relevant instance of relation causes. 
Figure [T] gives Morquio’s model as a causal structure of time-objects. The text 
inside a node gives the property of the time-object and the text outside gives 
its (base) existence ((earliest) start and (latest) finish). These existences are ex- 
pressed with respect to the single abstract time-axis (not depicted in the figure), 
lifetime say, whose granularity is years and origin birth. Some of these existences 
are uncertain. For example the exact meeting point of the two components of 
the trend time-object “femoral-capital-epiphyses progressive-resorption” is not 
known, but a margin for it can be specified (ages 8 to 15 years). The trend is 
expressed through a meta-qualification, “progressive-resorption”, over property 
“femoral-capital-epiphyses” . 

The solid arcs in figure [T] are instances of relation causes. In the depicted 
instances there are no underlying conditions, and the certainty factors are ex- 




Fig. 1. Modelling Morquio as a causal network of time-objects 
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pressed in a qualitative fashion as nc (necessarily causes), uc (usually causes), 
and me (may cause), which appear as labels on the arcs. Thus each arc can be 
expressed as causes(ri, r^, {}, nc/rtc/mc). 

The overall causal network representing the diagnostic theory is therefore 
partitioned into distinct causal models for the various failures. The partitioning is 
necessary in order to allow multiple, dynamic, instantiations of the same failure, 
thus capturing recurring failures. There is only one model per failure; however 
the same failure can appear as an ordinary causal state node in another failure’s 
model. A primary esMse does not have any causal antecedents; primary causes are 
those failures which do not figure in some other failure’s model. Different failure 
models are implicitly related through node sharing or explicitly related through 
secondary triggers (see section [H|) . We assume that failures define diagnoses, 
although observable primary failures that act as triggers (see below) may be 
excluded from this category. 

In |3], the causal network is a fully connected structure that does not permit 
multiple instantiations of the same failure and hence periodic failures cannot 
be dealt with. In addition and of relevance to the above “ .. one cannot deal 
with changing data and thus periodic findings; moreover, one cannot take into 
account the trend of the values of a parameter, which is usually a very important 
piece of information for diagnosticians.” [1], p.300. Temporal data abstraction is 
therefore not supported, nor are compound occurrences um. 

4.2 Case History 

The relevant history of a case consists of those assertions whose valid time is 
covered by the concrete time-axis corresponding to the time window of the dia- 
gnostic activity. Each selected temporal assertion corresponds to a (concrete) 
time-object. We assume that the number of time-objects comprising a case hi- 
story is kept to the minimum possible, by performing appropriate merges as 
well as other forms of temporal data abstraction on the raw time-objects m- 
Furthermore, potential causality dependencies between these time-objects are 
investigated through the application of axiom 3 and where a causality-link is 
established to hold it is appropriately instantiated. Some of the time-objects 
comprising the case history are contextual, they do not need any explanation; 
these usually assert past failures, past or ongoing therapeutic actions. 

4.3 Therapeutic Actions 

A pure diagnostic system is not required to plan and monitor the execution 
of therapeutic actions. Still it should have an understanding of the notion of a 
therapeutic action, or more generally the notion of an action. If the case history 
mentions past or ongoing therapeutic actions the system should understand their 
effects. For example the model of a failure may be different in the context of such 
actions (associated faults are nullified or accentuated (prolonged)). 
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Knowledge about therapeutic actions is part of the background component 
of the diagnostic theory. Each generic therapeutic action is represented in terms 
of the action per se, its preconditions and effects. All these are time-objects [^. 
At a generic level, the action is related with its effects through instances of re- 
lation causes. Knowledge of preconditions of actions is relevant to the task of 
a therapy planner, but not to the task of a diagnostic system that only needs 
to know the potential effects of such actions. For example, for each action (in- 
stantiation) included in the case history, axiom 2 is applied with respect to each 
of the causes relations between the action and its effects, in order to decide 
which causality-links actually hold; the effects corresponding to these links are 
also recorded in the case history. Entering such time-objects (effects of thera- 
peutic actions) may result in revoking or clipping the persistence of predicted 
observations in the case history, if any. 

At any time, special time-object now is positioned on the concrete time-axis, 
thus depicting which time-objects (actual, hypothesised, or expected) are past, 
ongoing, or future. 



5 Diagnostic Solutions: Instantiating Failure Models 

At any time, t, there are a number of potential diagnostic solutions, or hypothe- 
tical worlds. A hypothetical world consists of instantiated failure models. 

There are three ways to trigger (abduce) a failure model: (i) through primary 
triggers; (ii) through secondary triggers; and (iii) through another failure’s in- 
stantiation which includes the given failure as a causal state node. In the latter 
case the failure is a consequence of another failure that has already been hypo- 
thesised; the implicated failure is instantiated if an admissible path (see section|Q 
leading to it is established to hold in the given hypothetical world (reasoning 
forwards in time) and its conjectured existence does not refer to the future. 




Fig. 2. Primary triggers for Morquio 



A failure is associated with a number of primary triggers. In some theo- 
ries every observable node in a causal model could potentially act as a primary 
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trigger. However, for better focusing and higher compatibility with human dia- 
gnostician practices, the role of a primary trigger is reserved to a small subset 
of these nodes. A primary trigger is some, relatively cheap (easily obtainable), 
information (e.g. striking abnormality, observable primary cause, contextual in- 
formation, etc.) which directs attention to the particular failure. The primary 
triggers for Morquio are illustrated in figure These are also represented as 
time-objects. Comparing the primary triggers with the corresponding manife- 
stations (faults) in the Morquio model (figure H]), it can be seen that the pri- 
mary triggers are less restrictive, both with respect to their properties and their 
existences. The existence of most of the depicted triggers is in fact the default 
“at any time” (implied by the non-specification of start and finish). This is be- 
cause most of the actual triggers are expected to be point-objects. The primary 
trigger “platyspondyly at any time” is considerably less restrictive than the cor- 
responding manifestation which is “platyspondyly throughout from the age of 
1 year onwards”. Two of the triggers relate to the disorder’s expectation re- 
garding the progressive resorption of femoral-capital epiphyses. Once again the 
triggers are considerably less restrictive. For example the temporal constraints 
specifying the particular form of resorption are missing. Thus primary triggers 
by virtue of being less restrictive than corresponding faults in a failure model, 
simply provide heuristic guidance in the generation of diagnostic hypotheses, 
and they are by no means infallible; after all the same trigger can be associated 
with many failures. For example the hypothesis of Morquio will be triggered on 
the basis of “femoral-capital epiphyses absent at-birth” in spite of the fact that 
this hypothesis is in conflict with this observation. Primary triggers represent a 
context-free mechanism for the formation of hypotheses. The notion of a primary 
trigger, as a prime mechanism for the formation of hypotheses, has been used in 
early abductive diagnostic systems |22| . The contribution of our approach is in 
having temporal primary triggers. 

Formally, a primary trigger for some failure, <P, is expressed as the triple 
(r, conds, fi) where r is an abstract time-object, conds is a list of conditions 
and fi is an instantiation function. The semantics are that if r’s abstract exi- 
stence can be mapped onto the concrete time-axis underlying the case history, 
the case history accounts for concrete-r (predicate accounts-for is defined in sec- 
tion 6), and all the specified conditions are satisfied (by the case history), the 
instantiation function // is applied to the abstract model of and concrete-r, to 
return the concrete instantiation of i.e. to determine the particular existence 
of the failure on the concrete time-axis. Once a failure, which forms the star- 
ting state in the relevant causal model, is positioned on the concrete time-axis, 
the (abstract) existences of all the other time-objects in the model are concre- 
tized. Finally axiom 2 is applied with respect to every causes arc to see which 
causality-links actually hold. Potential causality-links, i.e. those whose truth 
status is unknown (possibly because their underlying conditions refer to future 
happenings), are also kept. However refuted causality-links are deleted and 
any components of the causal model that become disconnected as a result of 
such deletions are also deleted. 
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Multiple primary trigger activations for the same failure model are possible. 
Some of these could refer to the same trigger, thus capturing recurring events. 
If all activated triggers make compatible suggestions regarding the concrete exi- 
stence (instantiation) of the failure then no splitting of the hypothetical world 
is necessary. Similarly no splitting is necessary if the activated triggers suggest 
distinct occurrences of the same recurring failure; simply all distinct instantiati- 
ons of that failure are included in the same hypothetical world; these may sub- 
sequently be abstracted to a (compound) periodic occurrence of the particular 
failure. However, splitting of a world would be necessary if conflicting existences 
of the same occurrence of the failure are suggested, e.g. a non-recurring failure 
cannot have two disjoint occurrences. 

Secondary triggers interrelate failure models. They are of two types, com- 
plementary and opposing. A complementary secondary trigger suggests the in- 
stantiation of another failure in conjunction with some instantiated failure. An 
opposing secondary trigger suggests the replacement of some failure with another 
failure. The format of a secondary trigger for some failure is (r, cx>nds, fi, d>') 
where t is an abstract time-object, conds is a list of conditions, // is an instan- 
tiation function and <!>' is a complementary/alternative failure. Its semantics 
is similar to that of a primary trigger. Secondary triggers provide a context- 
sensitive mechanism for the formation of hypotheses. 

6 Accountings and Conflicts 

In this section we define binary predicates accounts-for and in-conflict-with which 
form the basis for the definition of evaluation criteria for hypothetical worlds. 
Different notions of plausible and best explanation can be composed from such 
criteria |14| . Predicates accounts-for(ri, t^) and in-conflict-with(ri, r^) take time- 
objects as arguments. Instances of these predicates are evaluated with respect 
to some consistent collection of time-objects and their interrelationships, the 
evaluation domain, e.g. the case history, or a hypothetical world. By default this 
is taken to be the domain of the first argument: 

— accounts-for(ri, Tj): time-object r^’s assertion, in the given evaluation do- 
main, can account for time-object Tj being asserted in the same evaluation 
domain. The predicate is reflexive and transitive. 

— in-conflict-with (ri, Tj): time-object r^’s assertion, in the given evaluation do- 
main, denies the assertion of time-object Tj in the same evaluation domain. 
The predicate is symmetric. 

The predicates are defined through the following axioms: 

Axiom 4: accounts- for (r^, Tj) O ((7r(Ti) ^ 7J'('Tj)) ^ D C= Tj). 

Axiom 5: accounts-for(Ti, Tj) 3 t^ (contains(Ti, t^)A accounts-for(Tfe, Tj)) . 
Axiom 6: accounts- for (t^, Tj) 4 ^ 

3 Tfe (causality-link(Ti, Tfc)A accounts- for ( t/c, Tj)). 

Axiom 7: accounts- for ( t^, Tj) O 3 t^ t„ t„' (causality-link(Ti, Tfc)A 
contains(T„, t/c)A causality-link(T„, t„')A accounts-for ( t„' , Tj) A assumed 
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Axiom 8: in-conflict-with(ri, r^) 

(excludes(7r(ri), 7r(Tj)) A ~'(Ti disjoint Tj)). 

Axiom 9: in-conflict-with(ri, Tj) 

3 Tk (contains(ri, Tfc)A in-conflict-with(rfc, Tj)). 

Axiom 10: in-conflict-with(ri, r^) 

3 Tk (causality-link(Ti, Tfc)A in-conflict-with(rfc, Tj)). 

Axiom 11: in-conflict-with(ri, Tj) 3 r„ (causality-link(Ti, Tfc)A 
contains(r„, Tfc)A causality-link(T„, t„')A in-conflict-with(r„' , Tj)A 
assumed(r„)). 

Predicate assumed is defined as follows: 

Axiom 12: assumed(ri) 4= 3 Tj (assumed(rj)A causality-link(Tj, r^)). 
Axiom 13: assumed(ri) 4= -i (3 Tj (contains(ri, Tj) A~i assumed (Tj))). 
Axiom 14: assumed(Tj) 4= 3 Tj ( c ont ains (Tj, T^) A assumed(Tj)). 

Axiom 15: assumed(Ti) 4= -■ (3 Tj causes(Tj, Tj, — , — )). 

Thus, a time-object, Tj, accounts for another time-object, Tj, either directly 
(axiom 4) or indirectly through one (if any) of its component time-objects (axiom 
5), or one (if any) of its established causal consequent time-objects (axioms 6 
and 7). For a direct accounting, the property of Tj implies (i.e. subsumes) the 
property of Tj, and the existence of Tj covers completely the existence of Tj. 
Partial accountings are not dealt with. Predicate in-confiict-with is similarly 
defined. For a direct conflict, the properties of Tj and Tj are mutually exclusive, 
and the existences of Tj and Tj are not disjoint. 

Predicates accounts-for, in-confiict-with, and assumed, are computationally 
expensive. Each of these entails branching based on causes and containment arcs. 
More specifically, accounts-for(Tj, Tj) / in-confiict- with(Tj, Tj) generates a search 
which grows forwards in time with the objective of piecing together an admissible 
path from node Tj to some node, Tk say, which directly subsumes/confiicts with 
Tj (Tj could be the same as Tk); the initial node, Tj, is hypothesised or directly 
assumed, e.g. observed. 

Definition 1. Admissible Path 

A sequence of time-ohjects ti,T 2 , ■■■,Tn forms an admissible path iff 
V t = 1, .., (n — 1) (^causality-link(Tj, Tj-|_i)V contains(Tj, r.+i)V 
isa-component-of (Tj, Tj+i)). 

Let Ti,Tj,Tk be three consecutive time-objects on some admissible path, 
where Rij{Ti,Tj) and Rjk{Tj,Tk). On the basis of axioms 7 and 11 it follows 
that {Rij yf isa-component-of V Rjk 7^ contains). 

The derivation of predicate assumed(Tj) generates a search which grows back- 
wards in time with the objective of piecing together (in a backwards fashion) an 
admissible path from some node Tj, that may be directly assumed, to node Tj. A 
node is directly assumed, in some context (say a hypothetical world), if it does 
not have any potential causal antecedents. This is not to say that the particular 
node is necessarily a primary cause, just that it may be considered a ‘starting’ 
state in the particular context. A compound node is assumed if none of its ex- 
pected components is revoked (i.e. each of them may be assumed). Similarly a 
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node is assumed either because it is contained in an assumed node or because it 
has an assumed direct causal antecedent. 

Axioms 4-11 treat their second argument as an atomic time-object. The follo- 
wing axioms apply to compound time-objects and derive an accounting/conflict 
through their components: 

Axiom 16: accounts- for(ri, r^) O 
V Tfe s.t. contains(rj , Tfc) { accounts-for (ri, Tfc)}. 

Axiom 17: in-conflict-with(ri, 

3 Tj~ s.t. contains(rj , Tfc) {in-conflict-with(ri, Tfe)}. 

The case history, CH, or some potential diagnostic solution, S, can be viewed 
as compound time-objects containing all the time-objects which comprise them. 
Hence we can formulate, in a simple way, compound queries such as “accounts- 
for(5', O)?” or “in-conflict-with(Ci?, 5)?”, where O is the compound time-object 
comprising all the abnormal observations. The evaluation domains for these que- 
ries consist of the single time-objects S and CH respectively. 

7 Conclusions 

In this paper we have focussed on the integration of time in an abductive dia- 
gnostic context. Time is intrinsically relevant to many diagnostic problems, and 
as such temporal reasoning plays a central role in the formation and evaluation 
of potential diagnostic solutions. Time should therefore be an integral aspect of 
the knowledge and reasoning of diagnostic systems for such domains. This inte- 
gration can be achieved by treating time as an integral aspect of the entities that 
constitute the processing elements of the systems. The notion of a time-object 
captures this requirement. 

The work presented in this paper is ongoing. The domain of skeletal dys- 
plasias and malformation syndromes has been our main source and testbed of 
the proposed ideas, and the practical results obtained so far (through the SDD 
system |24J l are very encouraging. Undoubtedly further refinements and exten- 
sions are in order. 
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Abstract. We present three approaches to revision of belief bases, which 
are also examined in the case in which the sentences in the base are 
partitioned between those which can and those which cannot be changed; 
the approaches are shown to be semantically equivalent. A new approach 
is then presented, based on the modihcation of individual rules, instead of 
deletion. The resulting base is semantically equivalent to that generated 
by the other approaches, in the sense that it has the same models, but 
the rule part alone has fewer models, that is, is subjected to a smaller 
change. 



Introduction 

Belief revision faces the problem of maintaining the consistency of a system of 
beliefs when new pieces of information are added, and, at the same time, it 
should preserve as many beliefs compatible with the new data as possible. In 
other words, given a set of beliefs and a new belief, we want to find a new set 
of beliefs which includes the new belief and differs as little as possible from the 
old set. Glassical logic has the ex absurdo sequitur quodlibet principle, i.e., from 
a contradiction anything follows; so, finding a contradiction in a system should 
render it completely useless. We know that things go differently. Gontradictions 
are tackled trying to repair systems locally, and keeping modifications as small 
as possible. Systems (not only logical ones, but also normative systems) are very 
conservative - or, at least, this is how we behave about them. Belief revision 
theories are the logic tools for keeping change small. Different specifications of 
the problem result in different formal treatments. 

The first watershed is between theories dealing with sets closed w.r.t. logi- 
cal consequence {belief sets) and theories dealing with (finite) non-closed sets 
{belief bases). Belief sets are simpler; the principle of irrelevance of syntax, for 
instance, is trivially satisfied: it does not matter how I describe a set, because 
what is important is the set (closed w.r.t. logical consequence) - the principle of 
extensionality. Belief bases, on the contrary, are much more realistic. Our kno- 
wledge is described by finite sets of sentences, and so are the norms regulating 
our daily behavior and all sorts of databases. Any change affects only finite sets 
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of sentences. This makes more difficult to satisfy (and even to state clearly) the 
intuitively appealing principle of irrelevance of syntax^ 

There are then two distinctions, originating from the same source: the first is 
made between purely logical approaches and approaches assuming the existence 
of extra-logical properties of the sentences; the second among approaches looking 
for a unique resulting revision and approaches admitting multiple results. The 
source of the problem is that there may be different candidates for a revision (of 
a set or base). While this is not a theoretical problem, it is a hard practical diffi- 
culty, when trying to put the theory of belief revision at work (e.g., in a deductive 
database system). We are confronted with two alternatives: a non-deterministic 
choice among the candidates, or building a unique result as a combination of the 
candidates; relying on extra-logical properties of the sentences may, sometimes, 
find a unique revision, but, more often, it simply elects a subset of candidates, 
and the problem of finding a unique result is still there. 

A special case of extra-logical properties, pertaining to finite bases, is the 
distinction of sentences between those that can be affected by the revision and 
those that cannot be affected. The distinction is rather common: it suffices to 
consider the defeasible and non-defeasible rules of non-monotonic logics, or, from 
another point of view, the distinction between observed facts and theoretical 
constructs of (some approaches to) epistemology. As a mnemonic, we shall speak 
of rules (sentences that can be changed) and facts (sentences that cannot be 
changed) o 

This paper deals with approaches to finite base revision, looking for a unique 
solution to the revision, and using, as sole extra-logical property, the distinction 
between rules and facts. When we use the word revision we intend this type 
of revision, unless otherwise stated. We describe three approaches to revision, 
and show that they are equivalent, in the sense that they result in the same 
bases. These approaches fail to satisfy the principle of extensionality: equivalent 

^ It may be argued that syntax is far from irrelevant, at least in some context. This 
is an example from a normative context. 

Let D be a system containing the following norms: 1) “adults may vote and drive 
cars” 2) “people become adults at 18”. 

Of course, 3) “people adults 18 may vote”, and 4) “people adults 18 may drive 
cars” are consequences of D . If we accept that systems bearing the same conse- 
quences are essentially the same (irrelevance of syntax), then D is equivalent to D' 
explicitly made of: 1) “adults may vote and drive cars”, 2) “people become adults 
at 18”, and 3) “people adults 18 may vote”. 4), of course, is a consequence of D'. 
Now, a bill changing the age in 2) to 21 passes. D is revised into E: 1) “adults may 
vote and drive cars”, 2) “people become adults at 21”; and D' is revised into E': 1) 
“adults may vote and drive cars”, 2) “people become adults at 21”, and 3) “people 
adults 18 may vote”. In both, driving requires to be 21, but only in E' it is possible 
to vote at 18. 

^ Another view, originating from the similarities between belief revision and the evo- 
lution of normative systems, holds that sentences which cannot be modified are 
“intended consequences” of regulations (norms): if a regulation conflicts with an 
intended consequence, it is the norm that should be changed. 
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sets of rules may result, after revision, in non-equivalent set of rules. Then, we 
present a new approach to revision enjoying the following properties: the set 
of models of the resulting base is the same set of models of the base obtained 
with the other approaches, and the set of models of the resulting rule set is not 
larger than the set of models of the rule sets resulting from the other approaches. 
Moreover, the approach is fully algorithmic, and lends itself to a straightforward 
implementation using tableaux systems. 

In section |T] following AGM, we quickly revise the common ground of revision 
of closed belief sets. In section we describe three approaches to base revision, 
and we show that they essentially lead to the same result. In section we present 
a new approach, in which sentences are modified instead of being deleted, and 
we discuss its relationship with AGM postulates. In section [H we show, with 
the help of some examples, the differences between the new method and the 
approaches described in section [2l Theorems are stated without proof, and the 
mathematical apparatus is kept to a minimum. 



1 Revision for Belief Sets 

The baseline for the modern treatment of belief revision is usually taken to be the 
result of the joint efforts of Alchourron, Makinson and Gardenfors. The theory 
they eventually arrived at is know as the AGM approach to belief revision (or, 
simply, AGM). This approach is fully described in [Zj; later work in the same 
line includes m [H]. 

The central notion in AGM is that of belief set. A belief set is a set of sentences 
(of some propositional language) such that it may be rationally held by an 
individual, that is m, ch.2.2), a consistent set closed under logical consequence 
(i.e., a theory). 

Definition 1. A set K of sentences is a non-absurd belief set iff (i) T is not a 
logic consequence of the sentences in K and (ii) i/K h 6 then 6 G K. 

Let Gn(S') denote the set of the consequences of a set S, then K = Gn(K) holds 
for belief sets. The set of all sentences in the language is also a belief set, namely 
the absurd belief set Kj_. Belief sets may be infinite, but, usually, they are not 
maximal, i.e., given a sentence of the language a, it is possible that neither a G K 
nor -la G K. If a G K, then a is accepted in K, if -la G K, then a is rejected, 
otherwise a is undetermined. Three basic kind of changes of belief are identified: 
expansion, contraction, revision. The table below summarizes the meaning and 
notation for them. 



previous state 


expansion K"*"a 


contraction K a 


revision K*a 


a accepted in K 


a accepted in K 


a indetermined in K 


a accepted in K 


a indetermined in K 


a accepted in K 


a indetermined in K 


a accepted in K 


a rejected in K 


Kx 


a rejected in K 


accepted in K 



Instead of giving operational definitions for the three operations, AGM give 
postulates for them intended to constrain the class of all possible operational 
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definitions ( 0 , ch.3). Here we recall only the postulates for revision. For the 
postulates for expansion and contraction the reader is referred to (?]• The postu- 
lates for expansion define a unique operation, K'^a =Cn(KU{a}); on the other 
hand the postulates for contraction and revision do not define a unique opera- 
tion. Among the postulates the most controversial is the so-called postulate of 
recovery: ii a G K, then K — (K~a)~^a. Revision and contraction are usually 
linked by the so called Levi Identity: K*a = (K“-ia)+a. 

Definition 2. The AGM revision of a belief set K. by a proposition a is a set 
K*a such that: 

(K*l) K* is a belief set; 

(K*2) a€K*a; 

(K*3) K*aC K+a; 

(^*4 ) */ “'O ^ K; then K+a C K* a; 
fK*5) K*o = Kj_ iff ->a is logically true; 

(K*6) if a b is logically valid, then K*a = K*6; 

(K*7) K*(aA6) C (K*a)+&; 

(K*8) if-'b^K*a, t/ien (K*a)+5 C K*(a A 6). 

Notice that from K*3 and K*4 we obtain that if -•a 
moreover if a G K, then K*a = K. Two properties 
derived from K*l-K*8 are: K*a = K*b iff 6 G K*a and 

{ K*a or 

K*b or 

K*anK*6 

Thanks to the Levi identity, the discussion may be limited to contraction, as we 
may derive revision accordingly. 

The result of contraction is a subset of the original set, so we have to look for 
a belief set K' which is a subset of K such that a ^ K'; the principle of minimal 
change suggests that we look for the unique largest of such subsets. It is easy 
to see that, in general, there is not such a unique set. In general, we may find a 
family of sets, which is called KTa (see the next section for a formal definition) . 
The most natural candidate for contraction is Full Meet Contraction. 

n(KTa) if KTa is non-empty ^ . n 

' ^ (Full Meet Contraction) 

K otherwise 

Unfortunately it suffers from a major drawback: 

Theorem 3. // a G K and K“a is defined as Full Meet Contraction, then b G 
K“a iff (a) b G K; (b) ~ia ^ b is logically true. 

In other words, in this case K“a = K n Cn(-ia). Contraction results in a very 
small set and minimal change is not respected. On the other hand, if we take 




^ K, then K*a = K+a; 
of revision that can be 
a G K*b and 



( 1 ) 
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only one of the sets in K_La {maxichoice contraction), we are stuck in a non- 
deterministic choice. To overcome this shortcoming AGM defines contraction as 
the intersection of some preferred elements of K_La. To this end we stipulate an 
order relation ^ over 2^, thus the intersection is defined only over the maximal 
elements of as we intersect less sets, we may get a larger result. 

2 Revision for Belief Bases 

AGM applies to sets which are closed under the consequence operator, that is, 
sets K such that K = Gn(K). Although some of the original papers in that tradi- 
tion deals with sets which are not closed under Gn (see, e.g., [E]), the theory was 
developed mainly in the direction of closed sets. A different approach maintains 
that the assumption of closed sets is too unrealistic to be really significant as a 
model of how belief works. A belief base B is a set of propositions. The set of 
consequences of B, under a suitable consequence operator Gn, is Gn(B) A B. The 
idea is that a realistic approach to belief representation must take into account 
that any agents has only a finite set of beliefs, possibly relying on them in order 
to derive an infinite set of consequences. Any normative system, for instance, 
is finite, and revising a normative system means revising a finite set of norms 
actually comprising it. This approach is discussed, among other, in a, na, [loi, 
El, |16| . First we describe bases in which all sentences may be changed, then 
bases partitioned in rules and facts. 

2.1 Bases Unpartitioned 

We start from the definition of three families of interesting subsets of B. 
Definition 4. BTa, B^a, B | a are defined as follows. 

— BTa is the family of maximal subsets of 3 not implying a: 



i,d b gB, then a G Gn(G U {6}) 
— B^a is the family of minimal subsets o/B implying a: 




then D ^ B^a 

B I a the family of minimal incisions of a from B such that: 





if D C I, there is C G B^a such that Cr\D=% 
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It is worth noting that if a is logically true then B_Lo = 0, = {0}, B|a = 0. 

Moreover, if B is finite any element C of one of the families above is finite, and 
there is a finite number of them. Different definitions of revision are obtained 
as functions of the above defined families. For simplicity, we define at first the 
contraction functions, then, according to Levi identity, the revision functions are 
obtained by adding the new data to the output of the contraction functions. The 
main idea is that of simple base contraction, using B_La. Contraction is defined 
as the intersection of all subsets in B_La. (We modify the original notion given 
in 0, m in order to retain the finiteness of the base) . 

A different approach was introduced in [1] for belief sets, but may be easily 
adapted to belief bases. The safe contraction is the base obtained deleting all 
possible minimal subsets implying a. Deleting the elements of an incision on 
B^o transforms B into a set B' such that a ^ B'. Deleting the elements of all 
possible incisions (all possible choices of elements from B^a) we get the excided 
contraction. 

Definition 5. Simple base, safe, and excided contraction and revision are defi- 
ned as follows: 



type 


contraction 


symbol ' 


simple base 


ricFBJ_o c 


B©ia 


safe 




bGB and b^ UceB^aCj 


B©2a 


excided 


if 


b€B and b^ UieBja I [ 


B^^a 





revision 




B®‘a = B®‘-ia U {a} 




i = 1,2,3 



The three contraction (and revision) functions share a common flavour. First of 
all, all of them use a set-theoretic operation on all the sets of a certain family. 
They might be called full operations, in the sense in which the adjective “full” 
pertains to full meet contraction according to AGM. 

We study now the relationships between the sets defined by these operations. 

Theorem 6. If a G Cn(B) and B is finite, then B®^a = B®^o = B®=*a. 

The same is not true, however, when only one set is considered. We are guaran- 
teed that 

— a ^ Cn({6|6 G B and b ^ I}), for I G B | a, and that 

— o ^ Cn({6|6 G C}), for C G BTa, 

but we are not guaranteed that 

— a ^ Cn({6|6 G B and b ^ C}), for C G B^a. 

Following what is defined for belief sets, we call this type of operations maxi- 
choice operations. In a sense, the notions of BTa and B|a are more robust than 
that of B^a, because they allow not only full operations, but also maxichoice 
operations which satisfy the minimal requirement of success. Maxichoice ope- 
rations, however, even if they satisfy the success postulate, do not satisfy the 
uniqueness requirement, that is, the requirement that the result of contraction 
or revision is well defined. This is usually considered unacceptable. 

The problems with full approaches is essentially the same as those with full 
meet contraction: they, usually, contract too much: the resulting set is too small. 
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2.2 Bases Partitioned in Rules and Facts 

Now we introduce the distinction among facts and rules. Facts are sentences 
which are not subject to change in the process of revision, while rules may be 
changed. If B is a base, the sets of facts and rules will be denoted and Bp. 
The operations studied in the previous section are extended to this case. 

We first examine an extension to the operation, called prioritized base 
revision, the idea of which is due to m In a prioritized revision, we first take 
all facts not implying a, then we add as many rules as we can without implying 
a, in all different ways. 

Definition 7. B fj. a the family of the sets C = (Cp, C,p), such that: 

(a) Cp C Bp and Cp C B^; 

(h) a ^ Cn(Cp) and if C D C Bp, then a € Cn(D); 

(c) a ^ Cn(Cp U Cp), and f/ Cp C E C Bp, then a G Cn(Cp U E). 

Definition 8. Let B &e a finite base, and let <?(C) = A& eCpUCp ^^6 

prioritized base contraction o/B by a, is the set B0ia = {VceBj^a 

If we take the assumption that the sentence we want to retract is not implied 
by facts alone, that is a ^ Cn(Bp), then the elements of B fj. a have the form 
(Cp,Bp), <P{C) = Ab.GCp Ap.GBp Vi = HCp) A Ap.GBp Vi and the following 
holds: 

ifa^Cn(Bp), then B0ia = (| Y^^^^^^>(Cp)|,Bp) ( 2 ) 

Another approach is related to the operation, whose idea is due to [Tj. We 
assume that only rules have to be blamed for a, so only rules involved in deriving 
a are deleted from the base; no set in B=^a contains only facts. 

Definition 9. For each C G B^a, let Cp = BpflC, such that C 2 Bp. The safe 
contraction of a from B, B02O = ({6|6 G Bp and b ^ Ucgb^o^p} >Bp). 

A third approach may be derived extending the operation. This uses prioriti- 
zed incisions. We assume, as usual, that a cannot be derived by facts alone. As 
above, for C G B^a, Cp = Bp fl C and Cp = Bp fl C. 

Definition 10. A prioritized incision (p-incision) on B=^a is a set / C Bp such 
that Cp n I yf 0 for each C G B^a. A p-incision is minimal if for all D C I, D 
is not a p-incision. The family of all possible p-incisions is denoted byF>\a. 

Definition 11. The excided p-contraction of B by a, B03C, is the set {b\b G 
B and 6 ^ UiGBfa I}- 

As above, we examine the relations among these three types of operations. A 
result like the one obtained in the unpartitioned case holds, that is: 

Theorem 12. If a G Cn(B) and B is finite, then B©io = B02O = B03a. 

As before, it is easy to see that ©2 and ©3 are equivalent, as p-incisions contain 
all and only the rules contained in at least one minimal subset, only differently 
arranged: UcGB^aCp = UiGBtaB 
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3 Modifying Is Better Than Deleting 

3.1 The Proposed Procedure 

In this section, we show how to modify rules in order to obtain a revised base. The 
revised base is equivalent to that we would obtain from the previously described 
methods, but the rule part, taken alone, has a larger set of consequences than its 
counterparts. That is, if RB is the revised base obtained through our procedure, 
and RBI is the revised base obtained through one of the previously described 
procedures, Cn(RB) = Cn(RBl) but Cn(RBp) D Cn(RBlp). 

Admittedly, taking this as an advantage is a matter of taste. In order to 
defend our view, we would like to compare rules to programs, and facts to data. 
Even if two programs are equivalent with respect to a certain set of data, it 
is fully justified to think that one is superior to the other if the former has a 
broader field of applicability. In a sense, if information is defined as constraints 
on the set of possible worlds, we want to get a set of rules that is as informative 
as possible. Another view is that we want to get the most from our base in the 
unlucky case that we are forced to retract all the facts we believe in. In Section 

3.2 the concept of retraction, as opposed to contraction, is discussed at some 
length. 

Dealing with belief bases instead of belief sets is difficult because it is not 
clear how much we have to compromise with syntax. We try to take a third 
way between irrelevance of syntax (reducing belief bases to belief sets) and full 
dependence on how propositions are expressed. In order to reach this deal, we 
use from one side a semantic approach (relating bases to their consequences) 
and from the other side a sort of syntactical normal form (see below). We want 
to stress that we want to describe a viable way of revising effectively a finite set 
of beliefs. By the way, the approach is fully algorithmic and can be implemented 
in the framework of well known procedures for tableaux-based theorem provers. 
We hope to report about it shortly. 

Bases, Evaluations, Models Each fact (fi may be rewritten as a disjunction 
of possibly negated atoms: ipi = Vfc 7ifc! similarly each rule pi may be rewritten 
as 

V 'd\J !3ik = CH^ Pi 

j k 

where Oj = /\^- and Pi = \/ f.Pik- Facts and rules in this form are said to be 
in normal form (or, simply, normal). 

Let A be a set of atoms, and S{X) be the set of sentences built upon X. 
Let V be an evaluation function for X, v : X ^ {true, false}. We extend v 
to X{X) — >• {true, false} in the standard way. Let V be the set of evaluation 
functions for X(X). 

Definition 13. v €V is a model of a € X{A) iff v{a) = true, v is a model of 
A = {oi} iff V is a model of each Ui. The set of models of a base B is denoted by 
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Proposition 14. IfV{al) andV{a2) are models of al anda2, respectively, then 
y(al Ao2) = V{{al,a2}) = V{al)nV{a2); and F(al V o2) = V{al)UV{a2). 

F(B) = V{Bp) n B1 C B2 ^ y(Bl) D V{B2); Cn(Bl) C Cn(B2) 

y(Bl) D V(B2) 

Adding formulae to a base means getting a smaller set of models. In particular, 
if a is a formula inconsistent with a base B, V{B\J {a}) = 0. The revision of 
B = {Bp,B,p) by a should be some base B' such that V{B'p) fl V^(B^) yf 0. 
In the process of revision, only the rule part of the base is changed, that is, 
B^ = B,,, U {a}. So, finding a revision amounts to finding a new set of rules BJ,. 

Models and Revisions It principle, nothing is said about the relationships 
between V{B'p) and V{Bp). Some properties are however desirable. First of all, 
we want the new formula a to be independent from the new rules VfB'p), that 
is, V^(B^) n I^(a) yf V{B'p)-, then we want that the new rules do not discard 
any of the models for the original rules, that is, V{B'p) A V{Bp). This is in line 
with Levi’s idea that a revision loses something and then adds something else; 
in this case, the set of models gets larger (from I^(Bp) to V{B'p)) and then is 
(effectively) intersected with V{a). This covers the main case. If a is consistent 
with B, that is V^(Bp) fl V{B,p) fl F(a) yf 0, we simply add a to the set of facts, 
leaving the rules unchanged. 

So the complete definition is: 

Definition 15. A revision ofB = (Bp,B,p) by a is a (finite) base B' = (Bp,B,pU 
{a}) such that: 

If y (Bp) n V{Bp) n V{a) yf 0, then B'^ = Bp, else: 

(a) y(B'p) n y(B^) n v{a) yf 0,- 

(b) VfB'p) 2 y(Bp); 

(c) y(B'p) n V{a) ^ y(B'p); 

Now we have to define a minimal inconsistent sub-base; this may be easily done 
using the definition of B=^a. A minimal inconsistent sub-base for (Bp,B,p U {a}) 
is simply an element of B=^-ia. When recast in the present semantic framework, 
the definition becomes: 

Definition 16. Let B = (Bp,B,p) be an inconsistent base. A minimal inconsi- 
stent sub-base of B is a base C = (Cp,C(p) such that 

1. v{Cp) n y(c^) = 0; 

2. Cp C Br and C,p C B,p/ 

3. if Bp C Cp, D,p C Cp, and either Dp C Cp or D,p C C,p, then y(Dp) fl 

The set of rules not included in the rule part of any minimal inconsistent sub- 
base will be denoted Bp. When clear from the context. Bp will denote the set 
of rules not included in the rule part of any minimal inconsistent sub-base for 
(Bp,B,p U {a}). In need, we shall use the notation Bp, a- 
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Orderly Revision and Minimax Revision Saying that rules are not ordered 
by importance means, roughly, that all the rules (involved in inconsistency) 
should be modified, their sets of model getting larger. We say that a revision is 
orderly if it modifies all and only the rules involved in inconsistency. Rules in the 
new base correspond to rules in the old one, with the possible exception that some 
old rules are eliminated. There is no reason to modify rules that do not belong 
to any minimal inconsistent subbase: so a “good” revision B' = U {a}) 

should have a set of rules B), such that Bp C BJ,, that is, R(Bp) C V(Qp). This 
means that R(B') = R(Bp) fl C\V{a) C R(Bp) fl fl V{a) and sets 

an upper bound on the size of R(B'). It is immediate to see that Bp is the set 
of rules obtained through the operation of safe revision: B 02 O- 

Definition 17. Let BTa be the family of sets of rules B), such that: 

1. R(Bp) C R(B'p) C l/(Bp); 

R(B;)nR(B^)nR(a) ^0. 

Let m(BTa) he the set of Q-minimal elements ofUTa: Bp G m(BTa) iff there 
is no Bp G m(BTa) such that R(B") C VifQ'p). A minimax revision o/B by a is 
B = {Bp,Bip U {a}) such that V{Bp) = UBj,Gm(BTa) ^(®p) 

Theorem 18. R(B) = l/(Bp) n n V{a) 

Theorem 19. There is an orderly revision B*a such that R(B*a) = R(Bp) fl 
R(B^)nF(a). 

Corollary 20. Let {C'*}s=i,.,,,t = {(C®, C® U{at})}s=i,...,t be the set of minimal 
inconsistent suhhases o/B = (Bp, B,pU{a}) . Let he B*“ = {pi , . . . , pk, pi, . . . , Pn} 
where {p\, . . . ,pk\ = Bp and {pi, . . . ,/5„} are rules such that V{pi) = V{pi) U 
(R(By,) n V{a)), where pi G C®. The revision B*a = (Rp“,B,p U {a}) is orderly 
and minimax. 

From Theorems EH and HD R(B-) = UB'G™(BTa) ^(B'p) C R(Bp). 
Theorem 21. B*a = {{pi} ,B,p U {a}) is a minimax revision where: 

- _ f cti — > /3i if ai ^ (3i & Bp 

* \ {{ui A -ip) — >■ Pi)} where ip G B,p U a otherwise 



3.2 Minimax Revision and AGM Postulates 

We are now going to examine relations between minimax base revision and the 
AGM theory of belief set revision. Let us say that a sentence a belongs to the 
belief set generated by B iff a holds in any model in R(B), that is: 

Definition 22. Given a base B = (Bp,B,p), the belief set generated by B is 
denoted K(B) . A sentence a G K(B) iff V (B) C V{a). 
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It follows from the definition that K(Bi) C K(B 2 ) iff T^(B 2 ) C I/(Bi). From 
Theorem [19] we know that F(B*o) = I^(Bp) n I^(B^) fl V{a), that is, y(B*a) = 
R((Bp,B^U{a})). It follows that K(BJ“, B^U{a}) = iF(Bp,B^U{a}), F(B;“) C 
y(Bp), and K(Bp) C K(B;“). 

The importance of this fact becomes apparent if we introduce another ope- 
ration, which we name retraction to distinguish it from contraction. Retraction 
applies only to (sentences implied only by) facts and modifies the fact part of a 

base El 

Definition 23. The retraction of ip j from B = (Bp,B,p) is the base B_,p^. = 

(Bp, 



Proposition 24. For any (pj € B,pU{a}, K((Bp,B,pU{a})_,pj) C K((B*“,B,pU 

To show the result it suffices to show that R((Bp“, B,p U ) C F((Bp, B,p U 

{a})_,p^). This holds because R(B*“) C VfQp). 

We may now define contraction from revision and retraction. 

Definition 25. The contraction o/B = (Bp,B,p) by a is B“a = (B*“'“,B,p) = 
B“a = (B*^“,B,p U {-'o})_^a- 

Postulates for Revision In the following table we show the status of mini- 
max revision with respect to axioms K*l-K*8. Axioms are interpreted with K 
meaning K(Bp,B,p), K*a meaning K(B*a) and K+a meaning K(Bp,B,p U {a}). 



axiom 


holds? 


notes 


K*1 


yes 




K*2 


yes 




K*3 


yes 


if a is consistent with B then K*a = K+a else K+a = 0 


K*4 


yes 


if a is consistent with B, then K*a = K+a 


K*5 


no 


• ; 1 1 1 -rr “■a is logically true or 

it holds: K*a = ATj. iff ^ / 

G Gn(B,p) 


K*6 


yes 


if a O 6 is logically true, then VfQp U {a}) = VfQp U {&}) 


K*7 


yes 


V{B*{a A b)) = (y(Bp) n y(B^) n V{a) n V{b)^ 


R((B*a)+6) = (y(Bp) n R(B^) n V{a) n V{b)) 


K*8 


7 


it holds: if Bp^aAb = ^p,a, then (K*a)+6 C K*(a A b) 



In conclusion, our revision operation on bases defines a corresponding revision 
operation on belief sets which satisfies the AGM postulates, with the exception 
of K*5 and K*8 which are substituted by weaker ones. 

® In order to keep the definition simple, we suppose that the fact part of the base is 
irreducible that is, no fact can be deduced from the rest of the base. For technical 
reasons, we say that in this case is irreducible w.r.t. Bp. In the following, bases 
will be assumed to be irreducible. 
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The Axiom of Recovery As for contraction, defined through retraction, we 
focus on the most controversial of the postulates, that is, recovery: 

if a S K, then K = (K“a)~'’a 

Then (B“a)+a = B,^))+a; moreover, a belongs to the consequences of 

B, that is, y(B) C V{a). 

Proposition 26. The contraction defined from minimax revision and retraction 
satisfies the postulate of recovery. 



3.3 Discussion 

Theorem gives us a simple recipe for transforming rules. Not only is the 
procedure fully algorithmic, but it can also be decomposed: the revision is built 
piecewise, starting from individual rules and individual facts. 

The result of the revision is itself in normal form, so that iterated revision is 
well-defined, contrary to what results for standard AGM belief set revision. (For 
ways of modifying the AGM theory in order to allow iterated revision, see:[2],[S|)- 
Minimax revision offers a clear definition of so-called multiple revision, that 
is revision by means of a set of sentences. When contraction is considered as the 
main operation, using a set of sentences instead of one often results in unclear 
intuitions (see [5|). 

Extensionality does not hold in general. Adding to a set a derivative rule 
does not change the set of consequences of the rule, but it may change the 
output of a revision operation. This suggests a stronger (not algorithmic) notion 
of equivalence between two bases: 

Definition 27. (Strong equivalence). B1 = (Blp,Bl,p) is strongly equivalent to 
B2 = (B2p,B2^) tff 

1. M(B1) = V{B2) and 

2. Va, P(Bl*a) = V{B2*a). 

Equivalent, but not strongly equivalent, bases are given in example 1321 

4 Examples 

Five examples are described. The first two are closely related each other, and 
show how the procedure described in the preceding sections fares when compared 
to standard procedures (e.g. safe revision or prioritized base revision). The third 
example illustrates the need for using rules in normal form. The fourth one shows 
that the procedure, notwithstanding its simplicity, may deal with contradiction 
resulting from chains of implications. 
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Example 28. Let B = ({a — >■ c, & — >■ c} , {a V 6}). Let the new fact be ~^c. 

B*-ic = ({(a A “'((a V 6) A “ic)) c,{b A ~'{{a V 6) A “> 0 )) -A c}, {a V b, “■€}) 

= ({(o A c) — >■ c, (6 A c) — >■ c}, {a V b, -ic}) = (0, {a V b, ~'c}) 

The same result would obtain if standard procedures had been used. 

Example 29. Let B = ({a -A c,b ^ c} , {a A b}) . Let the new fact be -•c. 

B*-ic = ({(a A “'((a A 6) A “ic)) -A c,{b A ~'{{a Ab) A “> 0 )) -A c}, {a A b, ~'c}) 

= ({(a A ~'b) -A c,{b A -^a) -A c}, {a A 6, ~'c}) 

The only difference between this example and the preceding one is that the fact 
component is {a A 6} instead of {a V 6}. 

Standard procedures do not differentiate between the two aspects: when rules 
are deleted, we always get the revision (0,B^ U {~'c}). Our procedure, which 
modifies rule on the ground of the new set of facts, produces instead a different 
result. 

It could be argued that the models of the two revisions are exactly the 
same, so that the difference is only at a surface level. The answer is that the 
two revisions are indeed different as far as the rule part is concerned, because 
V ({(a A -■&) -A c,{b A ->a) -A c}) C V (0). Were we to retract all facts, we would 
end up with a different set of evaluations. Our procedure is sensitive to differen- 
ces which do not influence the behavior of other procedures. 

These two examples shed some light on a sort of monotonicity property of the 
procedure. The (fact component of the) base of the first example is, intuitively, 
weaker than the (fact component of the) base of the second one, and this is 
reflected by the corresponding sets of evaluations. The two revisions show the 
same trend, in the sense that the rule set resulting from the first revision (the 
empty set) is of course weaker than the rule set resulting from the second revision. 



Example 30. Let B = ({a V 6 — >■ c} , {a}), (the rule is not in normal form). Let 
the new fact be -<c. If we did not care about the normal form, the revision would 
be: 



({((a V 6) A -'(a A -ic)) — >■ c} , {a, ~'c}) = ({(6 A -lo) — >■ c} , {a, ~'c}) 

A base equivalent to B and in normal form is the the base B' = ({a — >■ c, 6 — >■ c} , 
{a}). As & — >■ c is not included in any minimal inconsistent sub-base, 

B'*-ic = ({(a A -'{a A -ic)) — >■ c, 6 — >■ c} , {a, ~'c}) = ({& — >■ c} , {a, ~'c}) 

Here again, the difference between the two resulting bases is appreciated only 
if we imagine the retraction of facts; in this case, it results that H (6 — >■ c) C 
V {{b A -'a) -A c). This is a general result: rules in normal form result in smaller 
sets of evaluations, that is, more speciflc rule sets. 
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Example 31. Let B = ({a — >■ 6, 6 — >■ c, 6 — >■ d} , {a, d}). Let the new fact be -ic. 
The only minimal inconsistent sub-base is ({a — >■ 6, 6 — >■ c} , {a, d, ~'c}). 

B*-ic = ({a A -•d ^ b,a A c ^ b,b A -•a ^ c,b A -•d ^ c,b ^ d} ,{a,d, -ic}) 

This should be contrasted with the result of standard procedures, which delete 
rules involved in contradiction: ({6 -A d} , {a, d, ~'c}). Our procedure produces a 
more specific rule set, or, from another point of view, a rule set which leaves 
open the possibility of more inferences. This example shows also that the sim- 
ple procedure automatically deals with contradictions deriving from chains of 
conditionals. 



Example 32. Let 

B1 = ({a — >■ 6, 6 — >■ c} , {6}) B2 = ({a — 6 — >■ c, a — >■ c} , {6}) 

be two equivalent bases. It is easy to see that V(Qlp) = V(Q2p). Let the new 
fact be -ic, therefore the revisions are 

Bl*-ic = ({a — >■ 6} , {&, ~'c}) B2*-ic = ({a — >■ 6, a — >■ c} , {b, ~'c}) 

and V^(B2*-'c) C V(Ql*^c). This example shows that in dealing with bases, 
syntax matters. 

5 Directions for Future Work 

In this paper we have described a simple procedure for modifying knowledge 
bases expressed by finite sets of formulae of a propositional language, where 
each formula is a rule (which can be changed) or a fact (which cannot), to 
accommodate new facts; the main difference between our approach and other 
approaches is that rules are modified, instead of being deleted. 

This procedure may be extended in many directions. The first possibility is 
the extension to bases with more than two degrees of importance for sentences. 
This is the case with normative systems, where the hierarchy of importance 
stems directly from the hierarchy of normative sources. The second direction 
deals with the meaning of the rules. While the belief revision literature usually 
employs a propositional language, speaking about rules and facts suggests that 
rules might be rather seen as axiom schemes, describing universally quantified 
relationships between variables. How do belief revision procedures behave in this 
new setting? This is, we believe, a general questions, to which few answers are 
given in the literature. The third field for extending the framework is that of 
(propositional) modal logic. The motivation is again that of representing some 
features of the evolution of normative systems, which are usually represented 
by means of systems of modal (deontic) logic. It is indeed the mechanism of 
derogation (specifying exceptions) which originally suggested to modify rules 
instead of deleting them. 
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Abstract. In this paper we propose a hybrid system that bridges the 
gap between traditional image processing methods, used for low-level 
object recognition, and abductive constraint logic programming used for 
high-level musical interpretation. Optical Music Recognition (OMR) is 
the automatic recognition of a scanned page of printed music. All such 
systems are evaluated by their rate of successful recognition; therefore a 
reliable OMR program should be able to detect and eventually correct its 
own recognition errors. Since we are interested in dealing with polypho- 
nic music, some additional complexity is introduced as several concurrent 
voices and simultaneous musical events may occur. In RIEM, the OMR 
system we are developing, when events are inaccurately recognized they 
will generate inconsistencies in the process of voice separation. Further- 
more if some events are missing a consistent voice separation may not 
even be possible. 

In this work we propose an improved architecture for RIEM to allow 
the system to hypothesize on possible missing events, to overcome the 
major failure in the voice assignment due to minor recognition failures. 
We formalize the process of voice assignment and present a practical 
implementation using Abductive Constraint Logic Programming. 

Once we abduce these missing events and know where to look for them in 
the original score image, we may provide the proper feedback to the re- 
cognition algorithms, relaxing the recognition thresholds gradually, until 
some minimum quality recognition criteria is reached. 

Keywords: Abduction, Constraints, Optical Music Recognition, Know- 
ledge Based Recognition 



1 Introduction and Motivation 

Optical Music Recognition (OMR) is the automatic recognition of a scanned 
page of printed music. All such systems are evaluated by their rate of successful 
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Fig. 1. Score sample 



recognition; therefore, a reliable OMR program should be able to detect and 
eventually correct its own recognition errors. In this work we propose an ar- 
chitecture for an OMR system to deal with uncertain and missing information. 
We also present a practical implementation of such a system using Ahductive 
Constraint Logic Programming. 

The difficulties presented to an OMR system are common to most image 
recognition systems [2j: poor quality of the original, undesirable segmentation 
resulting from the image scanning process and loss of detail resulting from several 
processing phases. In the particular case of OMR the high density of musical 
information creates additional segmentation problems because many graphical 
objects intersect others or touch where, syntactically, they should not. Other 
difficulties arise also from the inconsistent nature of music writing (e.g. identical 
symbols with slight variations in size can have different meanings). 

Since we are interested in dealing with polyphonic music, some additional 
complexity is introduced as several concurrent voices and simultaneous music 
events may occur. In RIEM m , the OMR system we are developing, a scheduling 
algorithm was used to find assignments of events to voices [ 2 ]. If events are 
inaccurately recognized they will generate inconsistencies in the process of voice 
separation. Furthermore if some events are missing (i.e. symbols not recognized) 
a voice separatioij^ may not be achieved. In the example of Fig. [T] representing an 
excerpt of a music score with two voices and eight events, one possible separation 
of voices would be to assign events ei, 62, 63 and 64 to one voice and 65, ee, 67 
and eg to another voice. If, for example, the recognition of event 62 fails, it is 
no longer possible to perform a voice separation because, according to standard 
music notatioi0, the duration of all voices must be equal. 

To overcome these problems some approaches have been presented, making 
use of music knowledge to enhance the process of recognition. In [Ij and [Sj, a 
grammar describing musical notation is proposed to control the recognition pro- 
cess particularly the decomposition and labeling of objects. A solution is presen- 
ted for the problem of voice reconstruction in polyphonic scores and a simplified 

^ In this work we refer to voice separation as the process of syntactical reconstruction 
of the music score. No semantical music knowledge is involved in this process. 

^ refers to “classical” music notation 
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prototype has also been implemented, however limited to two voices per staff, 
and unable to go back to the image score to find unrecognized information. 

The absence of high-level musical knowledge during the first phases of reco- 
gnition leads to a loss of resources in searching for symbols where they have no 
chance to exist. Therefore an OMR system architecture should allow feedback 
of high-level knowledge, obtained during the interpretation phase, to low-level 
data during the recognition [El- 

Most knowledge-based approaches have mentioned the need to detect reco- 
gnition errors but they seldom explain concise methods for repairing or compen- 
sating these errors, in particular when objects are not recognized. 

In this work we propose an architecture to allow an OMR system to hypo- 
thesize on possible missing events, to overcome a failure in the voice assignment 
process due to minor recognition failures. For this purpose, we will use Abduc- 
tive Constraint Logic Programming (ACLP) [lllj [12j jl3) which is a system that 
integrates, into a single framework, both the paradigms of Ahductive Logic Pro- 
gramming (ALP) and Constraint Logic Programming (CLP). Both ALP and 
CLP can be viewed as forms of hypothetical reasoning where a set of hypotheses 
(abducibles in ALP and constraints in CLP) is determined in order to satisfy a 
goal. This paradigm will be mainly used to reason about missing events due to 
any of the reasons mentioned above. The main advantages of ACLP reside on 
the fact that it permits the expressive power of a high-level declarative represen- 
tation of problems in a way close to their natural specification (inherited from 
ALP) while preserving the efficiency of a specialized constraint solver. The pre- 
servation of both ALP and CLP identities is of great importance when dealing 
with problems such as OMR where on one side we have high-level descriptions of 
syntactic properties of music notation, better suited for ALP, while on the other 
side we have low-level information, such as spatial coordinates, better dealt with 
by CLP. 

In the example of Fig. |T] we expect the system to abduce a missing event 
( 62 ) while at the same time restrict its spatial coordinates to be situated between 
those of Cl and 63 . This way we have the notion of abduced event, with high-level 
semantic meaning, encapsulating the low-level spatial information. 

Once we abduce these events and know where to look for them, in the original 
score image, we may provide the proper feedback to the recognition algorithms. 
During this iterative process we may relax the recognition thresholds, gradually, 
until some minimum quality recognition criteria is reached. 

The remainder of this paper is structured as follows: in Sect. 2 we briefly 
describe the ACLP framework; in Sect. 3 we propose an OMR architecture; in 
Sect. 4 we describe the image pre-processing and recognition modules; in Sect. 5 
we formalize the process of interpretation and present an implementation using 
ACLP; in Sect . 6 we conclude and elaborate on future developments. 
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2 The ACLP Framework 

This description closely follows that of [12]. The ACLP framework consists in 
the integration of two forms of hypothetical reasoning, namely Abductive and 
Constraint Logic Programming. For a detailed description the reader is referred 
to [T3]. 



2.1 The Language of ACLP 

Given an underlying framework of CLP (7^), we define: 

Definition 1 (Abductive theory or program). An abductive theory or 
program in ACLP is a triple {P,A,IC) where: 

— P is a constraint logic program in CLP(TZ) consisting of rules of the form 
Po{to) ^ Ci{ui),...,C„{Un)\\pi{ti),...,Pm{t m 0 where Pi are predicate sym- 
bols, Ci are constraints in the domain TZ and Ui,ti are terms ofTZ. 

— A is a set of ahducihle predicates, different from the constraints in TZ. 

— IC is a set of integrity constraints, which are first order formulae over the 

language of CLP (TZ). o 



Definition 2 (Goal). A goal, G, has the same form as the body of a program 
rule whose variables are understood as existentially quantified. o 

An ACLP theory or program contains three types of predicates: (i) ordinary 
predicates as in standard LP, (ii) constraint predicates as in CLP and (iii) ab- 
ducible predicates as in ALP. The abducible predicates are normally not defined 
in the program and any knowledge about them is represented either explicitly 
or implicitly in the integrity constraints LC . 

The ACLP system allows for the use of Negation as Failure (NAF), handling 
it through abductive interpretation as proposed in [H]. 

The abducibles are seen as high-level answer holders for goals (or queries) to 
the program carrying their solutions. 

Definition 3 (Answer). An answer. A, for a goal, G, is a set of assumptions 
of the form: 

— ab{d), where ab G A and d G domain of TZ. 

— 3X (abi{X), ..., abn{X) , G (X)) , where ab \, ..., abn G A and G{X) is a set of 

CLP(TZ) constraints. o 

The integrity constraints express high-level properties that must hold by any 
solution (set of abducible assumptions) of a goal for this to be accepted. 

® The symbol 1 1 is used to separate the constraint conditions from the program predi- 
cate conditions in the conjunction of the body of the rule. 
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2.2 Declarative Non-monotonic Semantics of ACLP 

The (non-monotonic) semantics of ACLP is inherited from that of ALP and 
abduction. An answer for a goal G is correct if it forms an abductive explanation 
for G. 

Definition 4 (Solution). Given a theory (P,A,IG) and a goal G, an answer 
A is a solution of G iff there exists at least one eonsistent grounding of A (in 
TZ) and for any such grounding, Ag.- 

— PU Ag entails Gg, and 

— P yj Ag satisfies the IG. 

where Gg denotes a corresponding grounding of the goal G. o 

For the details of the corresponding grounding and a formal semantics of the 
integrity constraints the reader is referred to m- Informally integrity constraints 
are sentences that must be entailed by the program together with the abductive 
hypotheses (P U Ag) for Ag to be a valid set of hypotheses. 

Example 1. Consider the following ACLP theory and goal G: 

P = {p(X) ^X> 2\\q{X),a{X) IC = {-(A > 8||a(A))} 
q(A) ^ A > 4,A < 10||[]} A ={a} 

G = p(X) 

a solution of G is Z\ = 3A (a(A), A > 4, A < 8). o 



2.3 Implementation 

The ACLP system is built, as a meta-interpreter, on top of the ECLiPSe lan- 
guage 0 for Constraint Logic Programming interfacing appropriately the non- 
monotonic reasoning of abduction with the specialized constraint solving of the 
CLP language. 

Once a theory (P, A, IG) is loaded, the program is executed by calling at the 
ECLiPSe level: aclp— solve{goal, initial — hypotheses, output — variable) where: 

— goal is an ordinary ECLiPSe goal, 

— initial — hypotheses is a list of abducible hypotheses, and 

— output — variable is an ECLiPSe variable. 

The output — variable returns a list of abducible hypotheses, with their do- 
main variables instantiated to specific values in their domain, containing the 
initial — hypotheses and which is a solution of the goal. Normally, the list of 
initial-hypotheses is empty. 
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Graphical Music 

objects events 



Fig. 2. RIEM - three processing modules 



3 System Architecture 



In this section we describe the improved architecture of RIEM (a previous version 
can be found in|14p. an Optical Music Recognition system. RIEM has a layered 
architecture consisting of three main processing modules: Image Pre-processing, 
Recognition and Interpretation (see Fig[^. 

The pre-processing module handles all image low-level processing, transfor- 
ming image pixels into symbolic data. It reads a scanned page of printed music 
in the form of a TIFF image file and performs some basic filtering in the image. 
Follows the detection and removal of the score staff lines to allow the extrac- 
tion of all remaining graphical objects. These objects are then decomposed and 
stored as lists of graphical primitives, before being handed to the recognition 
module. 

All information generated by the pre-processing module will be trusted in 
further modules. 

The recognition module transforms graphical objects into meaningful musical 
events and symbols. Recognition is accomplished through object feature analysis 
and graphical primitive analysis. In RIEM recognition algorithms make use of 
thresholds that are functions of the score image features. This eliminates the 
need for magic numbers hard coded in the recognition procedures and allows 
the algorithms to cope with some variation in the print quality and style of the 
music scores. 

The interpretation module is responsible for the syntactic reconstruction of 
the music score. It attempts a coherent voice separation for a given set of recogni- 
zed events, based on music notation rules. During the process of voice separation 
the interpretation module may detect some missing events resulting from sym- 
bols not recognized. In such cases, information on the hypothetical location of 
those events is sent back to the recognition module, that will attempt a looser 
recognition in the specified locations. This process repeats until some limit is 
reached concerning the relaxation of the recognition parameters. 

At the output of the interpretation module a MIDI standard file and a gra- 
phical music-publishing file may be generated. 
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4 Image Pre-processing and Recognition 

Follows a more detailed description of the pre-processing and recognition modu- 
les of RIEM. 

4.1 Image Pre-processing 

The scanned music page, stored in a TIFF file, is read by the system and a 
run-length encoding is performed on the image to eliminate redundancy and 
therefore increase the performance of further processing stages. Some of the 
noise contained in the original scanned image is also filtered at this time. 

Staff-line detection Staff lines are determinant to reference and interpret all 
musical symbols. Because staff-lines overlap most all other graphical elements, 
their presence interferes with the recognition of the information contained in a 
music page. In RIEM, staff line detection and reconstruction is accomplished 
by detecting and matching contiguous horizontal line segments in the image. 
Our method is analogous to the one described in [3] and can cope with some 
curvature of the staff lines, a frequent case in scanned music pages. 

After the exact location of staff lines is known, they are removed from the 
image in order to isolate all remaining musical symbols. This process is not yet 
perfect since, in some cases, it is difficult to distinguish which parts of a staff 
line intersecting an object belong to the object. As a result, some object parts 
may be inadvertently removed causing additional object segmentation. Also, 
some unremoved staff line segments may be left attached to objects, introducing 
slight changes in its shapes. 

At this point we are able to determine two important values that will play the 
role of constants throughout the process. These are the average spacing between 
contiguous staff lines (SL_SPACING) and the average staff line width (SL_WIDTH). 
The first one is a reference for the size of all musical symbols in the score while 
the second is a measurement of the thickness of some detail features and may 
also be used as a tolerance value. 

Most threshold values used in the recognition algorithms depend on these 
values. Since they are redefined for every new score page, we make the recognition 
algorithms independent of variations in music print styles and sizes. 

Symbol extraction and classification Contour extraction is performed for 
all image objects and all contours are decomposed and stored as lists of graphical 
primitives such as line segments, arcs and junctions. Objects are then classified 
according to size, proportions, and number and type of compound primitives. 

4.2 Recognition 

According to the previous classification, graphical objects are handled in dedica- 
ted recognition algorithms. Recognition is based on graphical primitive inspec- 
tion and reconstruction, to find music object components (i.e. note heads, stems. 
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beams...) and also on feature analysis, to acquire attributes for the recognized 
objects (i.e. white or black notes, number of beams...). 

Recognition algorithms incorporate numerical thresholds that are a function 
of the two constants SL_ SPACING and SL_WIDTH and also a selectivity index R. 
Initially the value of R is set to 1 which corresponds to maximum selectivity of 
the algorithms. This value may be further decreased, after each feedback cycle, 
to relax the recognition algorithms when re-inspecting particular areas of the 
image. 

Two types of objects may be recognized: non-temporal and temporal objects. 

Non-temporal objects are usually expression markings or modifiers of the 
events pitch, and they are not handled until the whole process of recognition and 
interpretation is terminated. For details on this matter, the reader is referred to 

M- 

The recognition module generates events for all temporal objects and these 
are stored in a structure associated with its corresponding staff. All generated 
events are uniquely identified and labeled with the corresponding type (note, 
rest, accidental,...). Some additional attributes are also stored with the event 
expressing temporal and spatial characteristics. 

Spatial attributes include the event cartesian coordinates in the image. These 
coordinates provide the necessary information to pre-process events in the inter- 
pretation module, where important temporal relations will be inferred, based on 
the relative positioning of events. This spatial event analysis is based on proxi- 
mity comparisons, and makes use of the two constant values determined in the 
pre-processing module. 

Pitch information is not included with the events because it can also be 
inferred from its coordinates with respect to the already-known location of the 
staff lines. 

The temporal attributes are the events duration, and will be used later in 
the interpretation module, for the syntactic reconstruction of the score. 

In the previous sub-section we have seen that uncertainty is present in the 
process of recognition resulting mainly from image deterioration and object seg- 
mentation. In RIEM, when object features are ambiguously recognized, and they 
determine the duration of an event, the recognition algorithms may, in some ca- 
ses, generate a list of possible durations for the corresponding event. 

5 Interpretation 

In order to cope with polyphony, a correct voice separation is important because 
event synchronism is mandatory for the generation of event-based music file 
formats, like MIDI. 

This voice separation is undertaken in the interpretation module of RIEM, 
based on the syntax of music notation and on topological relations between 
events. 

In this section we describe, in greater detail, the problem of voice separation 
and event sequencing, together with its implementation within the interpretation 
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Fig. 3. Original score sample and event diagram showing superimposed and sequential 
events 



module. We start with a general description of the problem, together with the 
formalization of some needed concepts. The section is concluded with a general 
description of the implementation using Ahductive Constraint Logic Program- 
ming. 



5.1 Voice Separation and Event Sequencing 

Music events are represented by graphical symbols that may appear sequentially 
or superimposed in a music score. Such events can, in a simplified form (i.e. 
ignoring some attributes not needed at this point), be defined in the following 
manner: 

Definition 5 (Event). An event e is a triple {X,Y, Dur) where: 

— X{e) = X and Y(e) =Y represent the events relative spatial coordinates; 

— Durf,{e) = Dur represent the events duration. o 

The relative spatial locations of these events translate into important tem- 
poral relations: superimposed events express simultaneity and they are known 
to have the same starting moment. Sequentially layered events express temporal 
precedence, thus having distinct starting moments. 

These relations are inferred from domain specific spatial analysis (as depicted 
in Figure and may be determined for any pair of events. Formally we have: 

Definition 6 (Precedence). Let e\ and ei be two events. Event e\ is said to 
precede event e^, denoted by precede(e^, e 2 ), iff X{ei) < X{e2). o 



Definition 7 (Simultaneity). Let ei and €2 be two events. Events ei and €2 
are said to be simultaneous, denoted by simultaneous (e^^, e 2 ), iff X {02) = 
V(ei). o 



Improving Optical Music Recognition 351 



A monophonic line or voice, of a music piece, can be represented by a voice 
(sequence) of related events. 

Definition 8 (Voice). A voice is a finite set of events V = {ci, such 

that: 

yei,ejev,i^j,precede{ei,ej) W precede{ej,ei).o (1) 

Associated with each voice we have the notion of its total and partial dura- 
tions, given by the following definitions: 

Definition 9 (Voice Duration). Let V = {ei,...,e„} be a voice. The voice 
duration, denoted by Dur{V) is given by: 

Dur{V) = ^Dure(e„).o (2) 

n 

The voice duration results from adding the duration of all events that con- 
stitute it. 

Definition 10 (Voice Partial Duration). Let V = {ei,...,e„} be a voice. 
The voice partial duration (up to Ck&V), denoted by Durp(ek,V) , is given 
by: 

DuTp{ek,V) = ^ Durfiep).o (3) 

Cp &V 

precede{ep, Ck) 

The voice partial duration (up to a given event) is the sum of the durations 
of all events that precede the given one. 

In polyphonic music pieces several voices may exist concurrently. All musical 
events are assigned to an unique voice. These assignments of events to concurrent 
voices must obey several spatial and temporal properties. Such assignment is 
defined as follows: 

Definition 11 (Voice assignement). Let E be a finite set of events. A voice 
assignment of E is a finite set V = {Vi, ...,Vn}, where Vi,...,V„ are voices, 



such that: 

yVuV,ev,Dur{V,) = Dur{Vfi. (4) 

\JVn = E. (5) 

n 

hj AVj = %. (6) 

'iei(iV„,,ej(iV„,simultaneous{ei,ej) Durp{e^,Vm) = Durp{ej,Vn). (7) 
The number of voices in the assignment is given by n. o 
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In the previous definition, imposes that all voices must have the same total 
duration; m imposes that all events must be assigned to a voice; imposes 
that an event cannot be assigned to more than one voice and CZD means that 
simultaneous events serve as synchrony checkpoints, i.e. the partial duration, 
up to two simultaneous events, of the voices to which those events are assigned 
must be equal; conversely, if the partial duration of two voices is equal, their 
next events must be simultaneous. 

Based on the previous definition, we set forth the class of sets of events for 
which it is possible to have at least one voice assignment: 

Definition 12 (Coherent set of events). Let E be a finite set of events. E 
is coherent iff there exists at least one voice assignment of E. o 

An example of a voice assignment is represented in FigjU 

From the previous definitions it follows: 

Proposition 1. Let E be a coherent finite set of events. For any two voice 
assignments of E, Vi and Vj, we have that the number of voices of Vi is the 
same as the number of voices of Vj . o 

This proposition establishes that the number of voices in an assignment is a 
characteristic of the set of events that produced it. We can then define: 

Definition 13. Let E be a coherent finite set of events. We define V oices{E) 
as the number of voices of any of the voice assignments of E. o 

It is important to mention at this point that these definitions, although set 
forth for small cells of a music piece (e.g. a measure or line), may be extended 
to represent and relate a full music piece. Whole measures may be treated as 
events and may be related by preceding relations. In multi-part pieces several su- 
perimposed staffs can be viewed as streams and may be related by simultaneity 
relations (i.e. as suggested by extended measure bars across staves). This hier- 
archical nature of the representation enables us to partition large music pieces, 
by establishing additional constraints between the resulting smaller excepts. 

In a previous version of the RIEM system [2j, a scheduling algorithm was 
used to find assignments of events to voices. This algorithm succeeds only if the 
set of events is coherent. Although a perfect recognition algorithm produces, 
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from a piece of music written according to standard music notation, a coherent 
set of events, such a recognition rate is usually not the case. This way, the success 
of the scheduling algorithm cannot be guaranteed. 

Usually, what is present at the input of the interpretation module is a non 
coherent set of events mostly due to unrecognized objects. We must then envisage 
a process to hypothesize on the missing events. This goal is twofold: on one side, 
if we are able to produce a set of hypotheses on the unrecognized events, we can 
feed them back to the recognition module and relax the algorithms there; on 
the other side we can use these hypotheses to produce a voice assignment that 
even though may not be a completely accurate representation of the music being 
recognized, it is definitely better than not have any. 

The problem can be synthesized in the following way: given a set of events E, 
produced by the recognition module, find (abduce) a set of events A such that 
if U A is coherent, i.e. so that a voice assignment can be made. The following 
proposition guarantees that a solution A always exist: 

Proposition 2 . For every finite set of events E, there is at least one finite set 
of events A such that E U A is coherent. o 

As is typical in hypothetical reasoning systems, strengthened by the good 
performance rate of the recognition module of RIEM, we are interested in mini- 
mal solutions. 

Next we show how this can be accomplished by using Abductive Constraint 
Logic Programming (ACLP). 

5.2 Implementation 

As seen before, ACLP presents a declarative and flexible framework to solve the 
hypothetical reasoning problem at hand. 

The main components of the ACLP theory that allows us to abduce the 
missing events (in practice, what is abduced are the assignments) and perform 
the voice assignments, (P,A,IC), ar^: 

P : The Constraint Logic Program, P, contains: 

— rules establishing the necessary spatial relationships (precedence and simul- 
taneity) between events, as in Defs. |6]and[7| 

pv ecedei^I di j dd2^ i — A2 ^ (Ax -t- Cq), 

event{Idi, Xi, _, _), event{Id2, X2, -). 

simultaneous{Idi, Id2) ^ X2 > (Ai — Co), A2 < {X\ + Co), 
event{Idi, Xi, _, _), event(Id2, X2, _). 

^ For clarity, these rules do not follow the exact syntax of the ACLP meta-interpreter. 
Also some implementation restrictions of the ACLP meta-interpreter, such as the 
requesite that rules of P that depend on abducibles must be unfolded into the con- 
straints in IC, have not been followed to make the presentation more understandable. 
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where the positive constant Cq is introduced to allow for a margin within 
which events are still considered simultaneous; 

— rules defining the predicates: 

voice-duration{Voice, Duration) 
partial -duration{Voice, Event, Duration) 

as in Defs. lOlandfTOl 

— a set of facts, representing the events outputted by the recognition module, 
of the form: 



event{Id, X, Y, DurList). 

where DurList is a list of possible durations for the event. 

— a rule creating events for every abduced assignment: 

event{Id, X, Y, Dur) ->r- assign{Id, X, Y, Dur). 

A : There is only one abducible predicate which corresponds to the assignments 
of events to voices, i.e. A = {assign/Y\. 

IC : The set of integrity constraints, IC, contains: 

— an IC establishing that simultaneous events cannot be assigned to the same 
voice, according to d): 

T ^ Idi yf Id2,simultaneous{Idi,Id2), 
assign{Id\, t^oicei, _, d),assign{Id2, Voicei, _, _, _). 

— an IC establishing that the duration of all voices must be equal according to 

®: 



T •<— Dur I yf Dur 2, 

voice-duration{Voicei,Duri),voice-duration{Voice2, Dur2). 

— an IC establishing that every event must have an assignment according to 
©• Furthermore, these assignments must have a duration that belongs to 
the list of possible durations of the event: 



T •<— Dur :: DurList, 

event{Id, _, _, DurList), not assign{Id, _, _, Dur). 

— an IC establishing that the same event cannot be assigned to different voices 
according to ®: 

T ■<— Voicei yf Voicc2, 

assign{Id, Voicei, -> -j -),assign{Id, V oice2, -, -)• 



Improving Optical Music Recognition 355 



— an IC establishing that the partial durations, up to simultaneous events, of 
two voices must be equal according to 

_L ^ Duri ^ Dur 2 , simultaneous{Idi, Id 2 ), 
assign{Idi, Voicei, _, _, d), partial. duration{Voicei, I d\, Duri), 
assign{Id 2 , Voice 2 , -), partial -duration{Voice 2 , 1 d 2 , Dur 2 ). 

The module is called with the ACLP meta-predicate adp — solve{true, [], A), 
returning in A, the set of abduced assignments. 

Example 2. In the example of FigtU with event 62 not being recognized, the 
input of the Interpretation module would be the following set of events E (for 
clarity, the Id’s of the events have been set according to FigHJ: 

E = {er;ent(ei, 42, 130, [1]), event{e^, 130, 132, [1]), er;ent(e4, 182, 151, [1]), 
enent(e5, 38, 200, [1]), event{ee, 85, 189, [1]), event(er, 129, 201, [1]), 
event{es, 180, 190, [1])} 

with Co = 20. A would be a solution provided by the interpretation module: 

A = 3Xi,X2, X3, A4, X 5 {assign{ei, Xi,42, 130, 1), assignees, Xi, 130, 132, 1), 
assign{e 4 , Xi, 182, 151, 1), assign^e^, X 2 , 38, 200, l),assign{eQ, X2, 85, 189, 1), 
assign{er, X 2 , 129, 201, 1), assignees, X 2 , 180, 190, 1), 
assign{e 2 , Xi, X3, X 4 , X^), X\ ^ X 2 , X^, > 65, X^ < 105, X^ = 1 

representing the abduction of 8 assignments, where assign(e 2 , Xi, X 3 , X 4 , X^) 
corresponds to the assignment of the event that is missing from E. o 

6 Conclusions and Future Work 

In this paper we proposed a hybrid system that bridges the gap between tra- 
ditional object recognition methods, used for low-level image processing, and 
abductive constraint logic programming used for high-level musical interpreta- 
tion. 

It is desirable that OMR systems are capable of detecting and correcting its 
own recognition errors. For this, we propose an architecture to achieve a coherent 
voice separation in the presence of uncertain or missing event information. Our 
model deals with polyphonic scores with no limit on the number of voices. 

We have formalized the process of voice separation setting forth some pro- 
perties that, among other things, guarantee that a solution is always provided. 
We then use ACLP to perform voice separation, achieve the necessary voice as- 
signments, while at the same time adducing information about missing events, 
that is determinant to provide feedback to the recognition algorithms. 

The ability to perform hypothetical reasoning, characteristic of ACLP, to- 
gether with the use of feedback in our architecture, is a significant improvement 



356 



M. Ferrand, J.A. Leite, and A. Cardoso 



over the previous system. In a forthcoming paper, we will report the practical 
results of our architecture’s implementation. 

As we relax the recognition algorithms, we increase the level of uncertainty, 
therefore the use of three- valued abduction could be advantageous. Also, new 
developments related to the use of tabling techniques to perform adbuction [I], 
appear to be very promising, and could improve the performance of the system. 
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Abstract. In this paper we investigate the use of three evolutionary 
based heuristics to the open shop scheduling problem. The intractability 
of this problem is a motivation for the pursuit of heuristics that produce 
approximate solutions. This work introduces three evolutionary based 
heuristics, namely, a permutation genetic algorithm, a hybrid genetic al- 
gorithm and a selfish gene algorithm, and tests their applicability to the 
open shop scheduling problem. Several problem instances are used with 
our evolutionary based algorithms. We compare the results and conclude 
with some observations and suggestions on the use of evolutionary heu- 
ristics for scheduling problems. We also report on the success that our 
hybrid genetic algorithm has had on one of the large benchmark problem 
instances: our heuristic has produced a better solution than the current 
best known solution. 



1 Introduction 

An Open Shop Scheduling Problem (OSSP) involves a collection of m machines 
and a collection of n jobs. Each job comprises of a collection of operations some- 
times called tasks. Each machine can process at most one operation at a time 
and each job can be processed by at most one machine at any given time. The 
order in which the jobs are processed on the machine, and the order in which the 
job is processed by the machines can be chosen arbitrarily. The goal of the open 
shop scheduling problem is to determine a feasible combination of the machine 
and job orders, in other words, a schedule, which minimizes the overall finishing 
time, also known as makespan. 

The OSSP has many applications, especially in the manufacturing world 
and in industry. Consider for instance a large automotive garage with speciali- 
zed shops. A vehicle may require the following work (job in our terminology): 
replace exhaust pipe and muffler, align the wheels and tune up. The three ope- 
rations of that one job may be carried out in any order. However, the exhaust 
system, alignment, and tune-up shops are in different buildings, and it is there- 
fore not possible to perform two operations simultaneously. We also assume that 
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preemption is not at all desirable. In the OSSP, we assume that we have several 
such jobs, i.e., several vehicles that need repair. 

The OSSP belongs to the class of NP-hard problems. The partition problem 
can be reduced to it m- Thus, unless P = NP, the search for exact algorithms 
is replaced by the design of approximate algorithms. Incidently, only a handful 
of exact algorithms do exist. A relatively fast branch and bound method for 
the OSSP, that solved some benchmark problems to optimality, was recently 
developed [3] . Other methods include one-to-one mapping of feasible schedules to 
special Latin rectangles [l], and applying insertion techniques with rank minimal 
schedules [2]. 

In the next section, we give the formal definition of the open shop scheduling 
problem. The paper then shifts to the evolutionary based heuristics. The different 
ingredients with justifications, for our genetic algorithm (GA) implementations, 
are then presented. More precisely, we discuss the Permutation GA and the Hy- 
brid GA that use LibGA |4]. The third evolutionary based heuristic, the Selfish 
Gene Algorithm, is then introduced and discussed. In the fourth section, several 
problem instances are used with the three evolutionary based heuristics and the 
results are compared. We also report on the problem instance for which our Hy- 
brid GA produced a better solution than the currently best known solution. Our 
work concludes with some observations from our findings, and some suggestions 
on the use of evolutionary heuristics for scheduling optimization problems. 



2 The Open Shop Scheduling Problem 

The open shop scheduling problem consists of m machines Mi , M 2 , ■ ■ ■ , Mm and 
n jobs Ji, J 2 , . . . , Jn. Each job Ji consists of m operations Oij (j = 1 to m) 
where Oij has to be processed on machine Mj for Pij units of time without 
preemption. Furthermore, as mentioned in the previous section, we assume that 
each machine can process at most one operation at a time and each job can be 
processed by at most one machine at any given time. The machine order for 
each machine: the order in which the jobs are processed on the machine, and the 
job order for each job: the order in which the job is processed by the machines, 
can be chosen arbitrarily. The open shop scheduling problem is an optimization 
problem in which we are to determine a feasible combination of the machine and 
job orders, that is a schedule, which minimizes a certain objective function: the 
makespan. 

A feasible OSSP schedule assigns a start time to each operation, satisfying 
the constraint that a machine can only process one operation at a time and 
that two or more operations from the same job cannot be processed at the same 
time. The main objective is to generate a schedule with a makespan as short as 
possible, where the makespan is the total elapsed time in the schedule. A non- 
preemptive schedule is one in which the operation being processed by a machine 
cannot be interrupted. In other words, the difference between the finish time and 
the start time for a particular operation is equal to the length of the operation 
i.e., finish time - the start time = Pij m- 
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Consider the 4x4 benchmark problem depicted in Table [II Note that the 
number of jobs is equal to the number of machines. Most benchmark problems 
in the literature of scheduling have this property. A schedule to the problem 
instance of Table |T] is given in Table [ 2 [ We note that the operations are not 
scheduled in their order of appearance in Table [T] Thus, operation O32, for 
instance, is scheduled at time 78 while operation O 31 is scheduled at time 226, 
as can be seen in Table Operation O 22 is the last one to finish, with “end 
time” equal to 293, thus, the makespan of the schedule given in Table [2] is 293, 
which happens to be the optimal solution for this problem. 



Table 1. A 4 x 4 benchmark problem for the OSSP 



Machines 


Job Ji 


Job J 2 


Job Js 


Job Ji 


Ml 


85 


23 


39 


55 


M 2 


85 


74 


56 


78 


Ms 


3 


96 


92 


11 


Mi 


67 


45 


70 


75 



Table 2. A schedule for the benchmark problem in Table[l]with the makespan = 293 



Machine 


Job 


Operation 


Operation Length 


Start time 


End Time 


Ml 


Ji 


Oil 


85 


0 


85 


M2 


Ji 


Oi 2 


78 


0 


78 


Ms 


J2 


O23 


96 


0 


96 


Mi 


J3 


O34 


70 


0 


70 


Mi 


Ji 


O44 


75 


78 


153 


M2 


J3 


O32 


56 


78 


134 


Ml 


J2 


O 2 I 


23 


96 


119 


Ms 


Ji 


Ois 


3 


96 


99 


Ms 


J3 


033 


92 


134 


226 


Ml 


Ji 


041 


55 


153 


208 


M2 


Ji 


012 


85 


134 


219 


Mi 


J2 


024 


45 


153 


198 


Mi 


Ji 


Ol 4 


67 


219 


286 


Ml 


J3 


031 


39 


226 


265 


M2 


J2 


022 


74 


219 


293 


Ms 


Ji 


Oi 3 


11 


226 


237 



In the next section, we introduce the three evolutionary based heuristics we 
implemented and ran with OSSP benchmark problems taken from a well known 
source of test problems. 

3 Genetic Algorithms for OSSP 

In this work, we use three genetic algorithms. We start our discussion by pre- 
senting the Permutation GA and then turn our attention to the Hybrid GA. 
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We finally introduce a relatively new variant of GA known as the Selfish Gene 
Algorithm [^. For each genetic algorithm, we introduce the chromosome represen- 
tation and the stochastic operators of crossover and mutation. All three genetic 
algorithms use the makespan as fitness function. In other words, every chromo- 
some in the population is decoded yielding a schedule, and that schedule has a 
makespan which determines how “fit” that chromosome is. 

3.1 The Permutation Genetic Algorithm 

Each operation of a job is given a unique number. We first number the operations 
of job Ji, then the operations of job J2, and so on. Thus, for a problem with 
3 jobs and 3 machines, the first operation of job J\ will be given number 1, 
and the last operation of job J3 will be given number 9. For the individual 
chromosomes of the GA, we use strings of length p, where p is the total number 
of operations involved. A scheduling of the operations is represented by using a 
non-binary string, j/i, ?/2i ■ • ■ 1 Up-, where the value of t/i represents the operation to 
be scheduled next. Thus, if we have 5 jobs and 3 machines, the following string 
represents a scheduling of the nine operations: The chromosome of Figure [T] 



123456789 
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Fig. 1. Chromosome representation for the 3x3 OSSP 

is interpreted as follows. First schedule the operation that has number 7, then 
operation with the number 5, and so on. 

In our study, we use generational genetic algorithm with roulette wheel sel- 
ection, uniform crossover, and swap mutation. Since our goal is to minimize 
the makespan, each chromosome in the population occupies a slot on the rou- 
lette wheel of size inversely proportional to its fitness value. The procedure for 
calculating makespan is as follows: 

procedure makespan 
begin 

i = 0; 

while i is less than the number of jobs times the number of machines 

do 

// loop count = length of chromosome 

begin 

determine the job number; 
determine the machine number; 
get the length of the operation; 
schedule the operation; 

end 

return the end time of the machine that finishes last (this is 
the makespan of the current chromosome that we are evaluating); 



end. 
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Procedure makespan’s loop starts by determining the job number, machine 
number and the operation length. The next step consists in scheduling the ope- 
rations. Our goal is to schedule an operation at the earliest possible time. Sche- 
duling an operation at the earliest possible time is not as obvious as it might 
seem. 

Consider the following scenario (see Figure |2j: we want to schedule operation 
c from job Jc of length t on machine Mj. 



Machine M ■ “ 

w X y z 

Fig. 2. Scheduling operation c between operations a and b on machine Mj 



Our algorithm scans the operations that have been scheduled on Mj . If there 
is a “gap” between two consecutive operations, a and b such that: y—x > t, where 
X is the finishing time of a and y is the starting time of b, then we check to see if 
operation c could be scheduled between times x and y. Operation c is schedulable 
between x and y only if no other operation from job Jc is being processed on 
some other machine between times x and y. If no such gap exists, then operation 
c is scheduled sometime after the last operation that was processed on machine 
Mj. 

3.2 The Hybrid Genetic Algorithm 

Genetic algorithms along with heuristics are used in this approach. Hybridizing 
a GA with a heuristic to search the space for candidate solutions can be found 
throughout the literature of genetic algorithms 0. 

The chromosome is a string of length p, where p is the total number of 
operations involved. Each gene can take alleles in the range 1, 2, ..., j, where j is 
the largest job number. Thus, if we have 3 jobs and 3 machines, the string of 
Figure 0 represents a scheduling of the nine operations. 



123456789 
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1 


1 



Fig. 3. Chromosome representation for the 3x3 OSSP 

The chromosome of Figure [3] is interpreted in the following way. First, we 
choose an untackled operation for the first job, and schedule it. Then choose an 
untackled operation from the second uncompleted job. Since the third position 
of the chromosome in Figure [3] has the value 1, schedule an untackled operation 
from the first job again. 

Building a schedule is accomplished by the schedule builder, which maintains 
a circular list of uncompleted jobs and a list of untackled operations for each 
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job. Considering the “next uncompleted job” is taken modulo the length of 
the circular list. For example, by the time we get to the position of the 
chromosome of Figure all the operations of job 1 would have been tackled. 
Positions 8 and 9 will choose the last untackled operations of jobs 2 and 3, 
respectively. 

We yet have to establish how the choice among the remaining untackled 
operations is performed. In our implementation, we let the schedule builder 
choose the operation with the Largest Processing Time (LPT), breaking ties 
according to an a priori ordering of the operations[H]. 

Once again, our objective is to minimize makespan, which is used as the fitn- 
ess function. Procedure hybrid _makespan used in this approach differs slightly 
from procedure makespan in Section 13.11 since here we have to incorporate the 
LPT heuristic. 

procedure hybrid makespan 
begin 

i = 0; 

while i less than the number of jobs times the number of machines do 
// loop count = length of chromosome 

begin 

determine the uncompleted job number; 

determine the machine number for the job that has the 

longest operation processing time; 

get the length of the operation; 

schedule the operation; 

end 

return the end time of the machine that finishes last (this is 
the makespan of the current chromosome that we are evaluating); 

end. 



Once the job number, machine number and the operation length are deter- 
mined, the scheduling of the chosen operation proceeds in the same fashion as 
was described in Section [TTI 

In this implementation we use generational genetic algorithm, with roulette 
wheel selection, uniform crossover and swap mutation. 

3.3 The Selfish Gene Algorithm 

The field of Evolutionary Computation is based on search and optimization al- 
gorithms that were inspired by the biological model of Natural Selection. Several 
different algorithmic paradigms, among which we find Genetic Algorithms, Ge- 
netic Programming and Evolutionary Programming, were proposed after the 
Darwinian theory. Their underlying common assumption is the existence of a 
population of individuals that strive for survival and reproduction. Under this 
assumption, the basic unit of the evolution is the individual, and the goal of 
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the algorithm is to find an individual of maximal fitness [^, i.e., an individual of 
smallest fitness value, since the OSSP is a minimization problem. 

The work of R. Dawkins[^ has put evolution in a different perspective, where 
the fundamental unit of evolution is the gene, rather than the individual. This 
view is not in contrast with classical Darwinism, but provides an alternative 
interpretation, that is formalized by the Selfish Gene Theory. In this theory, 
individual genes strive for their appearance in the genotype of individuals, whe- 
reby individuals themselves are nothing more than vehicles that allow genes to 
reproduce. In a population, the important aspect is not the fitness of various in- 
dividuals, since they are mortal, and their good qualities will be lost with their 
death. Genes, on the other hand, are immortal, in the sense that a given frag- 
ment of chromosome can replicate itself to the offspring of an individual, and 
therefore it survives its death. Genes are selected by evolution on the basis of 
their ability to reproduce and spread in the population: the population itself can 
therefore be seen as a pool of genes. Due to the shuffling of genes that takes place 
during sexual reproduction, good genes are those that give higher reproduction 
probabilities to the individuals they contribute to build, when combined with 
the other genes in the same individual. 

Following the Selfish Gene Theory, the Selfish Gene Algorithm (FGA) neither 
relies on crossover nor needs to model a particular population. Instead it works 
on a Virtual Population (VP), which models the gene pool concept via statistical 
measures. To avoid confusion, for each gene we will explicitly distinguish between 
its location in the genome (the locus) and the value appearing at that locus (the 
allele). Each potential solution is modeled as a genotype, where each locus can 
be occupied by one of several possible alleles. In the FGA, different alleles fight 
to be presented in a specific locus. The “goodness” of each allele is represented 
by its higher frequency in the Virtual Population. The fight is actually performed 
at the phenotypic level, via a suitably defined fitness function. 

As with other Evolutionary Algorithms, an individual in the FGA is repre- 
sented by its genome. Let g be the number of loci in the genome; each locus Li 
{i = 1 ... g) into the genome can be occupied by different gene values, called 
alleles. The alleles that can occupy locus Li are denoted with (j = 1 ... rii) 
and are collectively represented as a vector Ai = (a^i, 0 ^ 2 , ■ • ■ , aim)- 

In the VP, due to the number of possible combinations, genomes tend to be 
unique, but some alleles might be more frequent than others. In the FGA, the 
goodness of an allele is measured by the frequency with which it appears in the 
VP. Let pij be the marginal probability for aij that conceptually represents the 
statistical frequency of the allele aij in locus Li within the whole VP, regardless 
of the alleles found in other loci. Marginal probabilities of alleles in Ai for locus 
Li are collected in the vector Pi = {pi^i, . . . ,pi^n.). The VP can therefore be 
statistically characterized by all marginal probability vectors Pi = {Pi, . . . , Pg). 
Note that P is not a matrix, because the number of alleles for each locus can be 
different |^. 
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The pseudo code for the main procedure in the FGA is as follows: 

procedure FGA 
begin 

genome B, Gi , G2; 

initialize all probabilities Pij to l/(ni); 

B = select Jndividual(); (best so far) 

do 

begin 

Gi = select_individual(); 

G2 = select_individual(); 

if (fitness(Gi) < fitness(G2)) then 

begin 

reward_alleles(Gi ) ; 
penalize_alleles ( G2 ) ; 

if (fitness(Gi) < fitness(B)) then B = Gi ; 

end 

else 

begin 

reward_alleles ( G2 ) ; 
penalize_alleles (Gi ) ; 

if (fitness(G2) < fitness(B)) then B = G2 ; 

end 

end 

while (steady_state() == FALSE) ; 

find makespan given by B; 

print the schedule and makespan given by B; 

return B; 

end 



An individual is represented by its chromosome. The number of genes in 
a chromosome is equal to twice the number of jobs (n), times the number of 
machines (m). The representation is different from the Hybrid and the Permu- 
tation Genetic Algorithms which have only n x m genes in the chromosome. In 
the chromosome, the odd numbered loci are interpreted as jobs and the even 
numbered loci are interpreted as operations within a job. 
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Fig. 4. Chromosome representation for the 3x3 OSSP 

We use the above representation because we wanted to avoid using a heu- 
ristic. Thus, the chromosome completely determines which operation is to be 
scheduled next. The chromosome in Figure Uis interpreted as follows: place the 
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2"'^ uncompleted operation of the 1®* uncompleted job in the earliest place where 
it will fit in the schedule, then place the 3’'®* uncompleted operation of the 1®* 
uncompleted job in the earliest place where it will fit in the schedule, and so on. 
Although no heuristic is used to determine the next operation to be scheduled, 
the schedule builder uses primitive data structure operations to keep track of 
the untackled operations and jobs. 

We calculate our objective function or makespan in the same manner as in 
procedure makespan in Section 3.1. 

To determine the efficiency of our three heuristics, we use well known bench- 
mark problems to test them. 



4 Experimental Results 

Our test data consists of 36 benchmark problems that are well known in the lite- 
rature [To] . Rather than explicitly giving the data, Taillard gives the pseudocode 
to generate the individual problem instances. We implemented, in C, Taillard’s 
algorithm and executed it to generate the 36 benchmark problems. The gene- 
rator takes as input time seed, machine seed and the problem size which are 
specified in m- We divided the benchmark problems into three categories. The 
small size problems are the 4x4 and 5x5 problem instances. The medium size 
benchmark problems consist of the 7x7 and 10 x 10 problem instances. The 
15 X 15 and 20 x 20 problem instances constitute the large size benchmark pro- 
blems. The current best and theoretical lower bound for each problem instance 
can be found in [ID] and in |^. 

In our study we employ a modified LibGA[l] package with elitism, which we 
found to be superior to other GA variants. The crossover and mutation rates 
for both algorithms, the Permutation GA and the Hybrid GA, were 0.6 and 
0.1, respectively. As for the FGA, the probability of mutation was set to 0.1, 
and the value for rewarding or penalizing alleles was set to 0.04. The FGA was 
terminated after 10,000 iterations or if each locus reached a certain value with a 
probability of 0.95, whichever occurred first. For comparison reasons, we chose 
10,000 iterations, since the other GA algorithms run for 500 generations with a 
pool size of 200 (thus yielding 10,000 computations). 

All three genetic algorithms were run 100 times for each benchmark problem. 
The best result and the mean of the 100 runs are tabulated in Tables 3, 4, and 
5. Table 3 reports the results of the three genetic algorithms on the small size 
benchmarks. As can be seen in Table 3, the Permutation GA and the FGA 
obtained the current best solutions for all 10 4 x 4 benchmark problems, and 
outperformed the Hybrid GA in all 16 problem instances. As for the 5x5 problem 
instances, the Permutation GA and the FGA either obtained the current best 
or produced a schedule whose makespan is within 3% of the current best. 

Table 4 reports the results of the three genetic algorithms on the medium size 
benchmarks. All three algorithms fared better with the 7x7 problem instances 
than with the 10 x 10 benchmark problems. Although, the three algorithms were 
not able to obtain the current best solutions, the Permutation GA produced 
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Table 3. Test results for 4x4 and 5x5 benchmarks 



Problem Instances 


Permutation GA 


Hybrid GA 


FGA 


Name 


Size 


Current best 


Lower Bound 


Best 


Mean 


Best 


Mean 


Best 


Mean 


in41.dat 


4x4 


193 


186 


193 


194 


213 


213 


193 


194 


in42.dat 


4x4 


236 


229 


236 


240 


240 


244 


236 


241 


in43.dat 


4x4 


271 


262 


271 


271 


293 


293 


271 


272 


in44.dat 


4x4 


250 


245 


250 


252 


253 


255 


250 


252 


in45.dat 


4x4 


295 


287 


295 


299 


303 


304 


295 


299 


in46.dat 


4x4 


189 


185 


189 


192 


209 


219 


189 


194 


in47.dat 


4x4 


201 


197 


201 


202 


203 


203 


201 


203 


in48.dat 


4x4 


217 


212 


217 


219 


224 


224 


217 


220 


in49.dat 


4x4 


261 


258 


261 


264 


281 


281 


261 


265 


in410.dat 


4x4 


217 


213 


217 


219 


230 


230 


217 


221 


in51.dat 


5x5 


300 


295 


301 


312 


323 


324 


302 


313 


in52.dat 


5x5 


262 


255 


262 


271 


269 


279 


262 


274 


in53.dat 


5x5 


326 


321 


331 


345 


353 


355 


326 


343 


in54.dat 


5x5 


312 


307 


312 


328 


327 


339 


318 


328 


in55.dat 


5x5 


353 


349 


353 


367 


373 


376 


357 


371 


in56.dat 


5x5 


326 


321 


326 


340 


341 


343 


326 


341 



schedules whose makespan is within 5.5% from the current best, the Hybrid GA 
produced schedules whose makespan is within 3.5% from the current best, and 
the FGA produced schedules whose makespan is within 5.5% from the current 
best. 

Table 5 reports the results of the three genetic algorithms on the large size 
benchmark problems. As can be seen in Table 5, the Hybrid GA outperformed by 
far the Permutation GA and the FGA in all 10 problem instances. The Permu- 
tation GA produced schedules whose makespan is within 6.5% from the current 
best, and the FGA produced schedules whose makespan is within 9.5% from the 
current best. As for the Hybrid GA, it produced schedules whose makespan is 
the current best for all 15 x 15 problem instances, and all but one 20 x 20 pro- 
blem instances. It missed finding the current best for problem instance in201.dat 
by less than 1%. More importantly, for in204.dat, our Hybrid GA produced a 
schedule whose makespan is 1207, which is better than the current best of 1209. 

5 Conclusion 

In this work, three evolutionary based heuristic are used as approximation al- 
gorithms with the open shop scheduling problem. Several problem instances are 
used with our evolutionary based algorithms. The results are compared and some 
observations, and suggestions on the use of evolutionary heuristics for schedu- 
ling problems are given. We also report an improvement on the previously best 
known solution for a benchmark 20 x 20 OSSP that our hybrid genetic algo- 
rithm produced. It comes as no surprise that the Hybrid GA outperforms the 
other two genetic algorithms for large problem instances. We observe that alt- 
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Table 4. Test results for 7x7 and 10 x 10 benchmarks 



Problem Instances 


Permutation GA 


Hybrid GA 


FGA 


Name 


Size 


Current best 


Lower Bound 


Best 


Mean 


Best 


Mean 


Best 


Mean 


in71.dat 


7x7 


435 


435 


438 


462 


447 


455 


446 


462 


in72.dat 


7x7 


449 


443 


455 


477 


454 


460 


463 


481 


in73.dat 


7x7 


435 


422 


443 


464 


450 


456 


450 


469 


in74.dat 


7x7 


460 


458 


465 


483 


467 


475 


469 


485 


in75.dat 


7x7 


400 


398 


405 


426 


406 


411 


411 


430 


inl01.dat 


10 X 10 


641 


637 


667 


705 


655 


672 


695 


716 


inl02.dat 


10 X 10 


577 


577 


586 


618 


581 


589 


611 


624 


inl03.dat 


10 X 10 


538 


538 


555 


583 


541 


549 


611 


624 


inl04.dat 


10 X 10 


595 


595 


627 


646 


598 


618 


646 


660 


inl05.dat 


10 X 10 


602 


596 


623 


645 


605 


618 


638 


655 



Table 5. Test results for 15 x 15 and 20 x 20 benchmarks 



Problem Instances 


Permutation GA 


Hybrid GA 


FGA 


Name 


Size 


Current best 


Lower Bound 


Best 


Mean 


Best 


Mean 


Best 


Mean 


inl51.dat 


15 X 15 


937 


937 


967 


998 


937 


948 


990 


1016 


inl52.dat 


15 X 15 


871 


871 


904 


946 


871 


886 


950 


971 


inl53.dat 


15 X 15 


934 


934 


969 


992 


934 


944 


992 


1012 


inl54.dat 


15 X 15 


893 


893 


928 


962 


893 


905 


955 


977 


in201.dat 


20 X 20 


1155 


1155 


1230 


1269 


1165 


1190 


1264 


1292 


in202.dat 


20 X 20 


1257 


1257 


1292 


1346 


1257 


1267 


1336 


1378 


in203.dat 


20 X 20 


1256 


1256 


1315 


1353 


1256 


1267 


1359 


1386 


in204.dat 


20 X 20 


1209 


1204 


1266 


1305 


1207 


1224 


1312 


1386 


in205.dat 


20 X 20 


1289 


1289 


1339 


1380 


1289 


1293 


1390 


1417 


in206.dat 


20 X 20 


1241 


1241 


1307 


1343 


1241 


1246 


1342 


1381 



hough the results obtained in this work are of preliminary nature, we believe 
that hybrid methods will have a lot to offer in the future. In our work, we re- 
ject the conventional wisdom of using binary representations in favor of more 
direct encodings. Our non-binary chromosomes represent implicit instructions 
for building a schedule. One last advantage genetic algorithms has over other 
more traditional methods is that genetic algorithms are far easier to parallelize 
than typical knowledge-based and operations research methods. 
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Abstract. There are presently many and seemingly different optimiza- 
tion algorithms, based on unrelated paradigms. Although some nice and 
important intuitions support those heuristics, there is (to our knowledge) 
no rigorous and systematic approach on how to relate them. Herein we 
present a framework to encompass those heuristics, based on the multiset 
formalism, providing a common working structure and a basis for their 
comparison. We show how to express some well known heuristics in our 
framework and we present some results on relations among them. 



Keywords: Optimization Heuristics, Multisets, Evolutionary Algorithms 

1 Introduction 

There are presently many different optimization algorithms, based on a large 
amount of different paradigms. Especially in optimization heuristics, we notice 
many common features (type and sequence of operators, for instance) that lead 
us to search for a basic scheme IRee95l . Therefore, in this work we use a ge- 
neral formal framework (PLATO) for population based optimization heuristics, 
using a multiset based representation IACMP99] . We hope to provide a better 
understanding of the workings of those algorithms, which we claim share a com- 
mon structure, and to allow construction of hybrid algorithms in a simple way 
by a mere combination of standard operators. Furthermore we will conclude by 
arguing that this formalism may also be used as an implementation guide, pro- 
viding new interesting features to a population based optimization algorithm. 
Some of the algorithms we try to cover with this model are not usually taken as 
population based (e.g. tabu search) but we show that they can be easily extended 
towards that direction. 

In what follows, we consider, as the problem to solve, the optimization of a 
function f{X), with X being the vector of the problem variables or of an appro- 
priate codification of their values. There may be a transform converting f{X) 
into a fitness function which we will generally use. The algorithms here consi- 
dered do not need any other information about f{X), such as derivatives, etc. 
In our approach the general representation of the population based optimization 
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algorithms is picted in Fig. [T] We consider Pq as the observable population in 
each iteration of the algorithm. For the sake of generality, there is also a control 
module, responsible for changes of the algorithm parameters or for alterations 
to the observable population between iterations. Due to space limitations this 
module will not be mentioned in this paper and we concentrate on presenting 
the optimization algorithm module. 




Fig. 1. Population based optimization algorithm 



The decomposition of the optimization algorithm module in its operators is 
presented in Fig. |2] Each rectangular box represents one operator over a popu- 
lation. The first operator in each iteration is Selection for Reproduction which 
selects the individuals to use in further operations. Next there is the Reproduc- 
tion operator which generates new individuals, based on the ones provided by 
the previous operator. Following we have the Mutation operator. Its function is 
to provide random changes to the input individuals, on a one to one basis. Then 
the Decimation operator is applied, to remove individuals from the population, 
for instance those with low fitness or lying outside the domain of the fitness 
function. The last operator Replacement is also a form of selection. It takes the 
population produced in the current iteration cycle and the input observable po- 
pulation to the same iteration, and selects individuals from both to produce the 
next iteration’s observable population. 



Population based . Individual manipulation 
operators . operators 




Fig. 2. Optimization Algorithm Operators 
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Notice that in Fig. we have divided the modules into two different groups, 
corresponding to the two basic families of operations in population based heuri- 
stics: Population based operations {Selection for Reproduction, Decimation and 
Replacement), which compare individuals in a population taking into account 
the fitness function and retain only some of them for further use, and Individual 
based operations {Reproduction and Mutation) which manipulate the individu- 
als’ representations. 

We claim that this general framework complies to different algorithms (see 
for instance IdJSQSI l. such as GAs, Evolutionary Strategies, Simulated Annea- 
ling, Tabu Search and Hill Climbing, among others. In the detailed description of 
operators that follows we give examples of specific operators for these algorithms, 
as well as a section with some examples of the complete configuration of the fra- 
mework for those examples. First we present one section with the basic concepts 
of the formalism, to allow the understanding of the operator descriptions. We 
close the paper with a section of conclusions and ideas for future work. 

2 Basic Formalism 

The basic concept in this type of algorithms is Population. Intuitively, a popu- 
lation is a set of individuals (possibly empty). However, since a population may 
have repeated individuals we will characterize it as a multiset of individuals, 
and, thus we may refer to the number of times a given individual is present in a 
population. 

An individual is characterized by a chromosome and a personal history (me- 
mory), so in general two individuals may have the same chromosome although 
different memories. One obvious possible use of memory is to record the gene- 
ration number when the chromosome appeared in the population, or even to 
record parenthood information. Some algorithms (Tabu Search [GL95] 1 defini- 
tely adopt this characterization and use this notion of memory associated to a 
individual. 

Definition 1 (Individual). An individual is a pair (c, m) where c is a chro- 
mosome and m is its associated memory. 

If (c, m) is an individual then we may refer to its genetic material as the set 
of values (alleles) present in its chromosome c. 

Since we may have individuals with the very same chromosome but different 
memories we introduce the following definitions: 

Definition 2 (Equality and cloning). Given individuals ii = (ci,mi) and 
*2 = (c2,rn2) we say that ii and i2 are clones iff C2 = ci. Moreover we say that 
any two individuals are identical if and only if they are two clones with the same 
memory. We don’t distinguish between identical individuals. 

Note that two individuals being identical means (among other things) they 
are clones, although the converse is not true. A simple case is when we consider 
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memory to be just the individuals age. It may be the case that another individual 
(a clone) may appear in a different generation. However, if age is considered, 
then, although being clones they are not identical. Furthermore, it is easy to see 
that, for memoryless populations being a clone is equivalent to be identical. 

Definition 3 (Search Space). The search space 17, is defined by a set of pairs 

17 = {(c, m) : c G C, TO G T~L} (1) 

where C is the chromosome space (a chromosome being the coding of the input 
variables) and T-L is the individuals memory space. 

Therefore, a population will be a collection of individuals with possible iden- 
ticals. In order to accommodate this multitude of identicals we use the multiset 
formalism. If the reader is not familiar with this formalism we strongly recom- 
mend reading its description in the Annex. 

Definition 4 (Population, Support). If 17 is the search space then a popu- 
lation is a mapping P : 17 — N such that the set {w G l7|P(w) ^ 0} is finite. If 
P is a population we define its support as the set 

P = {w : w G P}. 

The population P = {{1, 1, 1, 3, 4, 4}} has support set P = {1, 3, 4} and P(l) = 3, 
P(3) = 1 and P(4) = 2. 

Definition 5 (Population Cardinality). Let P be a multiset representing a 
population. The cardinality of the population (|P|) is defined by 

|P|= ^P(o.). 

uieP 

Note that a population may contain finite number of identical individuals 
although, by definition of set, there are no identical elements in P. We denote 
the cardinality of a set A by |A|. Clearly |P| < |P| always holds. In the previous 
example |P| = 6 and |P| = 3. It should be stressed that the support cardinality 
provides a simple measure of population diversity. 

All operators of the algorithm in Fig. |2]denote a mapping A4(i7) — >■ A4(I7) 
except for the Replacement operator where the mapping is A4(f2) x A4(f2) — >■ 

M{n). 

3 Population Based Operators 

Population based operators are usually known as selector operators and may 
subdivide into three distinct operators: Selection for Reproduction, Decimation 
and Replacement. 

Any selection operator uses individual’s fitness (which depends on the objec- 
tive function). As a matter of fact, in the general scheme here presented, these 
are the only three operators where the fitness function is used. 
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3.1 Selection for Reproduction 

The basic idea of Selection for Reproduction (cr) is to determine which indivi- 
duals of the actual population will produce descendants, and it is the very first 
operator applied in each iteration (generation). 

Formally, the a operator is a function that, from a population, produces 
another population (of reproducers), not necessarily of the same size: 

a ■. M{Q) M{n). 

We will designate by I the input population and by O the output population of 
this operator with supports / and O respectively. 

Definition 6 (Selection for Reproduction). The Selection for Reproduction 
operator is defined by: 

| 6 | 

d = a{I) = \J{{x,}} 

i=l 

where Xi are values of a sequence of random variables with values in I and a 
joint distribution p{x\, . . . ,X|^|). 

A sample of each variable Xi produces one element of the input population. 
Therefore, with generality, the output population will be obtained by a sequence 
of |0| samples of random variables (deterministic methods will have only one 
possible outcome). 

Support sets of the input and output populations of the a operator also verify 

| 0 |<|/| 



meaning that the number of distinct elements output to the Reproduction module 
is less or equal than the number of distinct elements in the source population. 

Thus, the net effect of this operator is to reduce the diversity of the population 
for reproduction. Normally it will retain, in a non-deterministic way, the more 
fit individuals, possibly with multiple identicals of some of them. 

Different a functions are now characterized by appropriate definitions of the 
N random variables Xi. 



Example 1. Simple roulette selection, u\. Variables are independent and iden- 
tically distributed (i.i.d.). The probability mass function (p(x)), in this case is 
defined by 



p{x = to) 






where /(w) represents the fitness value of individual uj. Notice that, since a 
population is a multiset over its support, only the individuals in the support 
are distinguishable. We can not distinguish multiple identicals in a population, 
under the multiset formalism. Therefore, in the equation above, the fitness value 
for each element lu of the support / is weighted by the number of its identicals 
I{uj) in the input population. 
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Example 2. Tournament selection, ct 2 . It is obtained by repeating |0| times: pick 
one individual from the input population (ojh)', pick another individual from the 
input population {uja)', from these two individuals, select the one with higher 
fitness (in case of equality pick the first one) . It can be shown that the probability 
of individual oj being selected is 



p(w) = 



J{uj) 






(|/| — rank{uj)) x 



E 



/(a 



+ 



E 



/(a 









where rank{uj) is the position of individual w in the support sorted in descending 
order. 



3.2 Decimation 



Decimation is an operator that selects individuals based not only on their fitness 
values, but also based on their individual memory values (similar to Koza’s 
jKo7,92] V In fact, in this scheme, it is the only selection operator that takes into 
account the individual’s memory. 

Population decimation can be built based on individual decimation which is 
defined as: 



Definition 7. Individual decimation is a function 12 such that 

S (u;) = 

^ ( a; otherwise 

where 0 is a boolean function 0 : 17 — >■ {true, false}. 

Based on the previous definition we define population decimation: 
Definition 8. Population decimation is a function 5g : Ai(f2) —>■ where 

5e(0) = 0 

= P.(a;)}} 

6g(XuY) = Se(X)uSff(Y). 



We next show Sg for some usual decimation operators identifying the corre- 
sponding 0 functions. 

Example 3. Annealing decimation [KG83] 



6»i((c,m)) 



false if rand{l) < exp(— 
true otherwise 



where /(c) is the fitness value of the chromosome c, m is the memory of the 
chromosome (the fitness value of the parent) and T is the annealing temperature. 



Example 4- Tabu decimation [GL95j 



6»2((c,to)) 



false if not tabu{m) 
true otherwise 



where tabu(m) is a boolean function which is true if the individual is in tabu. 
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3.3 Replacement 

The Replacement operator (p) is responsible for the last phase of each iteration 
of the optimization algorithm. It selects members from the actual population 
and elements produced in the current iteration of the algorithm to form the 
population to the next generation (similar to Fogel’s “Competition” |bbi95l ). 

Definition 9 (Replacement). Replacement is a two argument function on 
populations: 



p : M{n) X M{n) ^ M{Q). 

The two input populations are not handled necessarily in the same way by 
the operator; therefore the operator is further decomposed into two modules 
Choice and Integration (see Fig. n allowing for the necessary differentiation. 
Input Ii corresponds to the observable population Pq used in the beginning of 
the present iteration, while input I 2 is the population resulting from the previous 
operator (Decimation) in the cycle (see Fig.[2|). 




Fig. 3. Replacement : O = p{Ii,l 2 ) 



The Choice operation selects some elements {I^) from the actual population 
(/i), to go directly to the next generation. The remaining elements of the actual 
population (7]^ ) are passed on to the Integration module by taking the result 
of the operation Ii\Ii- This one produces population C to complete the next 
generation population. It selects individuals from population / 2 , produced in the 
current algorithm iteration, and from population Ii received from the Choice 
submodule. Finally, the output of the Replacement operator (O) will constitute 
next generation population obtained by the union of populations C. 

With such a structure, the Replacement operator may preserve a defined 
percentage of the actual population and have its remaining elements competing 
with the ones produced during the current iteration, to go through to the next 
generation. In case the operator takes both input populations indistinctly we 
will have Ii — $ and thus Ii = I\. 

A final remark regarding population diversity: we have 



\ 0 \ < I/1U/2I 
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although nothing can be said about the population diversity of two consecutive 
iterations (|0| and |/i|). 

Example 5. Total population replacement, p\. With constant size populations, 
this case corresponds to a gap value ( |Gol89j ) G = 1. The module operations 
are defined as follows: 



Choice{h) = 0 /' = 0, /'' = h 

Integration{Ii , I 2 ) = I 2 ^ h = I 2 

thus the replacement operator will be p(/i,/ 2 ) = I 2 



Example 6. Evolution Strategies with a (/i, A) model l [Sch95] ). p 2 - Notice that 
in this case \I\\= p. and I/ 2 I = A. The module operations are defined as follows: 

Choice{ii) = 0 => 7]^ = 0, = h 

IntegrationiJi , 12 ) = p-best(l 2 ), 

where pJjest is a function returning the p best individuals of a population. 



4 Individual Based Operators 

Individual based operators may subdivide into two operator classes: Reproduc- 
tion and Mutation. 

4.1 Reproduction 

Different sorts of reproduction have been considered in evolutionary algorithms. 
Existing reproduction operators may use one (sometimes referred to as neigh- 
borhood |GL95| 1. two (also known as crossover [Gol89| 1 or more parents (which 
has also been named orgy [EvKK95] 1 . 

We next show how to accommodate those approaches within this framework. 
We first introduce the general case of n-parental reproduction and then we show 
that it gracefully falls into the particular cases of one parent and two parents 
reproduction. 

Definition 10 (n-parental reprodnction). A single n-parental reproduction, 
is an operation : A4{f2) involving m < n individuals, which 

generates K descendants. 

Let P = {{wi, . . . , cOm}} with m < n, then is defined by: 

{ P if m < n 

P if m = nA 

rand(l) > Pr 

7t(wi, . . . , L0m=n) Otherwise, 
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where tt : L2" M{^2) and Pr is a user defined reproduction probability. Func- 

tion 7T will generate K descendants from n parents. Notice that the function tt 
is the only one, in this framework, that may change the value of the individual 
memory. 

Definition 11 (Population n-parental reprod.). Based on the previous de- 
finition, we define n — parental reproduction over populations : A4(f2) — >■ 
Ad (17) as: 

fi">(0) = 0 (2) 

ri">(l)=r<">(l) if|X|<n (3) 

fjn) (X u y) = (1) u (y) if |1| = n 

In the case of mono-parental reproduction, n = 1, equation [3] above reduces to 
equation 

Example 7. Mono-parental reproduction - the case of a distance 1 neighborhood 
without memory |Ree95| . Let’s consider individuals with a bit string chromo- 
some c = (ci, • • • ,ci). Then a mono-parental reproduction function generates K 
descendants from one parent: 



K 

i=l 

The conditional random variables Xi\c, independent and identically distributed, 
are defined as: 



(X|c,p(a:|c),S'), (4) 

where the variable X\c takes values in S with a probability distribution p(x\c). 
S' is a subset of C defined in this case as: 

fa; : a: = (ci,- - • ,c;) © (yi,- - • , 2 /z),'l 

I 2/i e {o,i},ELiy* = 1 J 

and p{x\c) = j where © is the bitwise sum module two. 

Example 8. Mono-parental reproduction - the case of cloning. In this case the 
reproduction function is defined as: 

K 

7T2((c,m)) = |J{{(c,mi)}} 

i=l 

notice that in this example memory is manipulated while the chromosome is 
maintained identical to the parent’s chromosome. 
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Example 9. Mono-parental reproduction - the case of tabu search with a swap- 
ping neighborhood (see also |GL95| 1 operator. Then the reproduction function 
is 

K 

7T3((c,m))= |J{{Xy I (c,m)}}. 

In this case we have two joint random variables: 

{XY \{c,m) ,p{xy\{c,m)) ,Q), (5) 

where Q is a subset of 17 : 

{ {x,y) : Xr = Cr,r = 1, - ■ ■ ,lA 

A Xi = CjAXj = Ci 

A 

y = tabu-update{m, i, j) 

<m(lp{xy\ {c,m)) = 

Example 10. Bi-parental reproduction without memory, K = 2 and one-point 
crossover. This is the more traditional version of the reproduction operator. 
Given any two chromosomes it generates two chromosomes from the genetic 
material of the parents. Let Ua and ix>h be two individuals, both with no memory: 

) • ■ • ) ^o.i 1 Ca^+i ; • • ■ 5 Cai ) ? and iOl) ^ 5 ■ ■ ■ 5 5 ? • ■ • ; ^bi ) 7 

with 1 < j < L 

Then traditional reproduction is (with i = int((l — l)ranc?(l)) -I- 1): 

”^ 4 (^ 0 ; bOly) {-f , . . . , 7 • ■ • 7 ^bl ) ’ ’ 

\ (cfel 7 ■ • ■ 7 ^bi 7 Coj+l 7 ■ ■ ■ 7 Ca; ) 7 0/ }} 7 



4.2 Mutation 



Mutation is an operator which randomly changes the individual chromosomes. 
It does not affect individual memory. As a population operator it is based on 
an individual operator that outputs one individual for each input. This and the 
fact that it does not change the memory are the two aspects that, quite clearly, 
distinguish Mutation from mono-parental reproduction. 

Definition 12. Ghromosome mutation is a non deterministic function defined 

by 



7 : C ^ C, 



where this function is defined by the random variable 

{X |c ,p{x |c) ,S'.y) 

and the random variable X\c takes values in S.y (a subset of C) with a probability 
distribution p{x\c). 
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Definition 13 (Individual mutation). Individual mutation is a non deter- 
ministic function \ ^ f2 defined by 

// \\_/(c,w) if rand{l) > Pm 

mj) I ^2;^ ^ j otherwise 

where (c, m) is an individual to G Q, with chromosome c and memory m, Pm is 
mutation probability and 7 (c) is the chromosome mutation function. 

Definition 14 (Population mutation). The population mutation is a func- 
tion pL^ : A4(f2) —?■ M(f2) defined by 

A 7 ( 0 ) = 0 

fl^{X UY) = U 

Example 11. One bit flip mutation. This type of mutation, 71 , is simply defined 
by using the same random variable used in example |7] It flips exactly one bit of 
the chromosome. 

Example 12. Bitwise mutation, 72 . It flips at least one bit in the chromosome. 
Let c = (ci, • • • , c;) be a string of bits 

' x:x = (ci,- • • ,cj) 0 (j/i,- • • ,yz). 



Q — 

^72 ~ 



y* e {0, i},x;i=i2/i > 0 



and p{x\c) = g5ZLiyi(l — q)* Yl\=iVi^ where q is the bit flipping probability. 

Example 13. Swap mutation, 73 . Exchanges two randomly chosen distinct genes. 

In this case p{x\c) = -^y- Let c = (ci, • • • , cj) be any chromosome 

Uj 



^13 ~ 



X \ Xr = Cr,r = 1, - ■ ■ ,l A i j A 

ry^iArf^jAxi = CjAXj = Ci 



5 The Whole Algorithm 

In the previous section we analysed specific operators. In this section we show 
how algorithms are implemented and use previous operators. We use the notation 
{p, and A) of [Sch95J for population dimensions. The overall template algorithm 
is (refer to Figsl 2 ]and[HD: 

Initialize PpiPriPm 
{obeying to Constraints } 
repeat 

Pi ^ a(Po) 

P3 'G- Pj{p2) 

P4 G- Mhi 

PQ G- p{pQ, P4) 

until stop -Criteria. 
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5.1 Simple Genetic Algorithm - SGA (p, A) with fi — \ 



Initialize : Po,Pr,Pm 
{Constraints : pr ^ 0,Pm ^ 0} 
repeat 

A ^ o-i(A) 

A^tA^(A) 

A ^ mt2(A) 

A ^ A ^ 

A ^ pi(A) A) 



/ * cf.Ex. [U for cTi * / 

/ * cf.Ex. [ini for 7T4 * / 

/ * cf.Ex. [m for 72 * / 

/ *Voj € A) holds * / 

/ * cf.Ex. [5] /or pi* / 



until stop -Criteria. 



5.2 Tabu Search (1+1) 



Initialize : Pn.Pr ^ l.,Pm ^ 0 



{Constraints : |Po| = 1} 
repeat 

A ^ A 
A^ Aa^(A) 

i^3 i — p2 

A ^ ~5eAh)^ 

A ^ p2(Aj A) 

until stop -Criteria. 



/ * Vw G A), p(w) = 1 * / 

/ * c/.+a;.[HI for tts * / 

/ * Vw G +2 ~'P-y{^) holds * / 
/ * cf.Ex.\^or02 * / 

/ * c/.+x. 0 /or p 2 * / 



In this case the replacement operator 



Pq = p{Pq, Pi) may be further defined as: 



choice(Po) = 0 



integration(Po, Pi) =def 



A iff{x) > f{y),x G + 0 , 2 / G A 

Pi otherwise 



Notice that a generalized Tabu Search with |Po| > 1 is automaticaly accommo- 
dated in this algorithm by simply dropping the constraint. 



5.3 Hill Glimbing (1+1) 



Initialize : Po,Pr l,Pm ^ 0 
{Constraints : |Po| = 1, A= 1} 
repeat 



A^rA(A) 
A ^ A 

Pi ^ 5g^{p3) 

A ^ P2(A, A) 



/ * Vw G pQ, p{lo) = 1 * / 

/ * c/.+a;.[Z|* / 

/ * Vw G +2 -<p..y{uj) holds * / 
/ * c/.+a;.[l| for 02 * / 

/ * c/.+a;.0 for P 2 * / 



until stop-criteria. 
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Notice that several forms of neighborhood may be used just by changing the 
reproduction operator tt. The replacement operator is defined as in Tabu Search 
above. 

5.4 Hill Climbing (1 + A) 



Initialize : Po,Pr ^ IjPm ^ 0 
{Constraints : |Po| = 1,K = X\ 
repeat 

Pi^Po 

P2^rif(A) 

A ^ A 

A 50^{P^) 

Pq ^ p2(Aj A) 



/ * Vw S Pq, p{lo) = 1 * / 

/ * cf.Ex.\7}/^ithK = A * / 

/ * Vw € p 2 holds * / 

/ * cf.Ex.\^ for 02* / 

/ * cf.Ex.\^ for P 2 * / 



until atop -Criteria. 



In this case the integration operation in replacement is defined as: 



mt(A, A) 



A 'iff{x)>f{y),xGPo,yGp4 

{{ya}} f{ya) > f{yi)yyi pya& Pa otherwise 



6 Conclusions and Future Work 

We presented a formal framework for expressing optimization heuristics and how 
some well-known existing heuristics can be described within this framework. It 
showed to be general enough cope with different paradigms, using only a small 
set of operators. 

In the future, other heuristics will be expressed in the framework. This will 
provide feedback for further refinements and possible improvements of the fra- 
mework. It is an open issue whether this formalism may accommodate non-linear 
representations (trees and graphs). 

Time-evolving analysis has to be done, as well as comparison among heuri- 
stics. One expected result is a portrait of a hierarchy of heuristic methods. 

Examples presented here, are (mainly) memoryless algorithms and special 
attention will be given to that issue in a near forthcoming report. 

The framework presented can also be seen as a specification for an imple- 
mentation, to be done. A prototype based on the framework is expected to have 
a clear semantics, and to be fairly modular and scalable, providing a basis for a 
heuristics programming environment. 

One interesting aspect to explore in this issue is to have the implementation 
working with the multiset representation. In our opinion this will provide a 
lerger population diversity along the iterations of the algorithm. We have used, 
informally, a single notion of diversity. Other, more solid, diversity definitions 
should be developed. 



382 



L. Correia, F. Moura-Pires, and J.N. Aparfcio 



Another aspect that will be dealt with is the development of the control mo- 
dule (cf. Fig.[T|). It will provide, amongst other things, a way to consider runtime 
results and to change operators and parameter values for better performance. 
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Annex 

Definitions below are taken from [Fer95j . 

Definition 15 (Multiset). Let S be any set. A finite multiset over S' is a 
function p : S — >■ N such that the set {s £ S | p (s) yf 0} is finite. The set of all 
finite multisets over S is denoted by M (S) . 

We will use a set-like notation {{}} to denote a multiset. Operations similar 
to the ones applied on sets (e.g. G, U, C, etc.) are also applied to multisets. 
We will use round symbols to denote operations on sets (e.g. C) and similar 
square symbols for the same operation on multisets (e.g. C), whenever possible. 
Some operations, like £, will be denoted ambiguously by the same symbol. In 
the following we abbreviate finite multiset to multiset. 

Definition 16. Let p, tt be arbitrary multisets over S. The operation £,U,[I 
, \, n on M{S), the set of finite multiset, are defined as follows: 

— Ws G S ■. s G p p(s) > 0, 

— p U 7T is the multiset defined by (p U 7t)(s) = p(s) -I- 7r(s), for all s G S, 

— p C 7T Vs G S' : p(s) < 7t(s). If the last inequality is strict for all s G S 
the we have strict inclusion of multisets, i.e., p {Z tt, 

— pFI 7T is the multiset defined by (pFI 7t)(s) = min(p(s), 7r(s)), for all s G S, 

— p\tt is the multiset defined by (p\7r)(s) = max(p(s) — 7t(s), 0), for all s G S. 



Definition 17. Let <P : A ^ A4{A) be a function. This function is extended to 
a function i>: A4{A) — >■ Ai(A) as follows: 

- ^> ( 0 ) = 0 , 

-<?({{«}}) =<?(«), 

- {X UY) (X)U ^ (F). 
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